B.M. Shchigolev 


MATHEMATICAL 
ANALYSIS OF 
OBSERVATIONS 


MATHEMATICAL ANALYSIS 
OF OBSERVATIONS 


MATHEMATICAL ANALYSIS 
OF OBSERVATIONS 


B. M. SHCHIGOLEV 


Translated by Scripta Technica Inc 
Editor H. EAGLE 


LONDON ILIFFE BOOKS LTD 


NEW YORK AMERICAN ELSEVIER PUBLISHING 
COMPANY INC. 


Originally published in the U.S.S.R. in 1960 


English edition first published in 1965 
by Iliffe Books Ltd, Dorset House 
Stamford Street, London, S.E.1 


© Wiffe Books Ltd, 1965 
Published in the U.S.A. by 
American Elsevier Publishing Company Inc. 

52 Vanderbilt Avenue, New York, N.Y. 10017 
Library of Congress Catalog Card Number 65-36710 
Printed and bound in England by 
Butler & Tanner Ltd 
Frome, Somerset 


BKS 5052 


CONTENTS 


Introduction ..,...... ce ccc ere e e ec ewes cc ec ee ee 


Part I 
Operations With Approximate Numbers 


Chapter 1, Estimation of Errors of Approximate Numbers. ., 
1, Fundamental Problems in the Theory of Approximate 
Calculations ..........0cc00-. coe ee error eons 
The Exact Error of an Approximate Number ...... 
Limiting Absolute Error. ....... ccc ec ccccces 
Limiting Relative Error...... ; 
Estimate of an Error fromthe Number of Known. Digits 


OB 9 


Chapter 2. Errors Incurred in Fundamental Arithmetic Oper- 
ations e e eo e eo @ td eo eo @ @ @ eo @ e zd e @ oe @¢ e e.6U6©@ eo |°@ @ . eo ¢ 


6. Addition,.......... . cee wees e ees 

7. Statistical Estimate of the Error of a SUM... eee ee 

8, Subtraction of Close Numbers ....... coc er eees 

9, Multiplication ....... a a a ar a a a a 
10, Division... . 2... ep ee ee ew cee eee reer eee 
Chapter 3, Estimate of the Error in a Function with Approxi- 
mate Arguments ..... coos 

11, Limiting Errors of a Function of 2 a Single Independent 
Variable ......2.06-. coe er eee crores 


12, Errors in the Simpler Elementary Functions ...... 
13. The Error Involved in Functions of Several Variables 
14. The Concept of the Inverse Problem in the Theory of 

Approximate Calculations ....... a a a a 


Part II 
Point Interpolation 


Chapter 4. General Remarks........ co cee eee ee 
15. The Approximation of Tabulated Functions. The Con- 
cept of Point Interpolation. .... a a 

16. A Theorem on the Existence of an Interpolational Poly- 
nomial... . 2. ee ee ee ee ew hb ewes cc ee ee eee 


Vv 


45 


45 


49 


vi Mathematical Analysis of Observations 


17, Lagrange’s Interpolational Polynomial .......... ol 
18. Estimate of the Error in Point Interpolation....... 54 


Chapter 5. Interpolation from a Table with a Variable Step. . 60 
19, Difference Quotients of Tabulated Functions....... 60 
20. The Construction of Interpolational Formulas Regard- 


ing Differences ....... ccc eee rec ceeeee : 62 
21. Newton’s Interpolational Formula for a Table with 
Variable Step .. 2... cc eee cece ec cces cece eeee 69 


Chapter 6. Interpolation from a Table with a Constant Step.. 69 
22. Ordinary and Central Differences of a Tabulated Func- 


tion with a Constant Step,.......... veceeccee 69 
23. The Basic Properties of Ordinary Differences ..... 73 

24, The Method of Constructing Interpolational Formulas 
for Tables with a Constant Step ....... cee 78 

25. Newton’s Formulas for Interpolating Forward and 
Backward, ....... Cec cee eee ete cece eee 80 
26. Stirling’s Formula, cect eee cc et ew wee ceeee 89 
27. Bessel’s Formula (Two Variants). cee 88 

28, General Remarks About the Application of the Inter- 
polational Difference Formulas ...... cee eee eee = 9S 

Part II 
Probability Theory 

Chapter 7. Random Events; Basic Concepts ........... . 97 
29. Random Experiments ........ceccce. cee ec eee 97 
30. The Classical Definition of Probability ......... . 99 


31, Examples of the Calculation of a Probability ...... 100 
32. A Theorem on the Addition of Probabilities ......,. 102 
33. The Theorem on the Multiplication of Probabilities . 104 


34. Total Probability Hypotheses........... 107 
35. A Priori and a Posteriori Probabilities of the "Hypoth- 
CSCS . 2 ec ce cc ewe cc cc cr ee cw ec ewe etwas .. 108 
Chapter 8. The Problem of Repeated Trials ............ 113 
36, Statement of the Problem and Derivation of the Basic 
Formula ..... cc cece ccc ccc ree reece et eee 113 
37. The Probability Distribution for the Number of Times 
that an Event May Occur........2cccccccvces 115 


38, Laplace’s Approximation Formula for Calculating the 
Probabilities of the Possible Number of Times of Oc- 


currence of an Event .. 1... wc cee tt ce tte cee 123 

39, An Approximating Curve for the Probability Distribu- 
6 Co) 0 eee 126 
40. Poisson’s Distribution (The Law of Rare Events) ... 127 
Chapter 9, Discrete Random Variables ...........26.¢ ee 129 
41. Random Variables .......... ccc ccc ccccccce 129 


42. The Expectation of a Discrete Random Variable ,... 131 


43, 


44, 
45, 


Contents 


Theorems on the Addition and Multiplication of Ex- 
pectationS .... ec ccc ee ee ce wer cee ee oe 

The Variance of a Random Variable andits Properties 
Expectation and Variance of the Number of Occur- 
YENCe€S.,.. cc eecces comer ec ee ee eee co ceo tees 


Chapter 10. The Law of Large Numbers ......... coco. 


46. 
47, 


The Chebyshev- Markov Lemma....... coe eco ee 
Theorem of J, Bernoulli........ core eee reves 


Inequalities and Chebyshev’s Theorem .,........ 
Comments on the Law of Large Numbers, Statistical 
Probabilities oooeoeoo#ecsoeoo#eoeoeeeoeeoee8 ® oe @ oeoe#u646868eeo0e0ee0e 0 


Chapter 11. Continuous Random Variables.......... coe 


ol, 


The Distribution Function of a Continuous Random 
Variable .......6. cc ee re eee coe ere rece ee 
Probability Density ......... coc ere ere cree. 
Expectation, Variance, and Moments. .......-.ec¢-. 
Uniform Probability Distribution .........c.ecce-. 
Formulation of Lyapunov’s Theorem, The Normal 
Probability Distribution ....... ccc es ccc cceee 


. An Approximate Derivation of the Normal Law..... 


Parameters of the Normal Law. Gauss’ Curve..... 
A Function of a Normal Distribution. Calculation of 
Probabilities ........6. coc ere reer cee 
The Moments of a Normal Distribution coc err eee 
Distributions other than the Normal ........ccece 


Chapter 12, Joint Probability Distribution of Two Continuous 


65, 
66. 


Random Variables .....-ececcecees coos 
The Joint Probability Density of Two Variables osee 
Conditional Probability Density .......... ce eee 


The Normal Distribution of Two Random Variables .. 
The Probability Density of a Normal Distribution ... 


Part IV 


Fundamentals of the Theory of Random 
Measurement Errors 


Chapter 13. General Remarks on Measurement Errors .... 
Types of Measurement Errors........2.-ec00- 
The Basic Hypothesis in the Theory of Random Errors, 
Methods of Evaluating Errors .......-ccccsee . 


Chapter 14. Analysis of Equally Accurate Measurements of a 


67. 


Fixed Quantity ......e2-r00- : 
The Problem of Analyzing Measurements of a ' Fixed 


Quantity... ee ee eee ree eer ore. coe ee eco. os 


vil 


133 
136 


139 


142 
142 
144 
148 
152 


156 
159 


159 
161 
163 
166 


168 
170 
172 


175 
178 
180 


187 
187 
189 
192 
195 


205 
205 


208 


212 


212 


viii 


68. 


69. 
70, 


71, 


72. 


Mathematical Analysis of Observations 


The Most Probable Value of a Measured Quantity. The 
Method of Least Squares......-..se- cee 
The Mean Square Error of the Arithmetic Mean eee 
The Most Probable Value of the Mean Square Error of 
an Individual Measurement ........2e.- 

A Second Derivation of the Approximate Value of a 
Measured Quantity and of the Approximate Value of 
the Mean Square Error of an Individual Measurement 
An Example of the Analysis of Equally Precise Meas- 
urements of a Single Fixed Quantity ........-..06.2. 


Chapter 15. Analysis of Measurements Which Are Not Equally 


73. 


Precise .......2.2.0.6. cece ree ee ‘ 
The Concept of Non-Equally-Precise Measurements, 
Weighted Measurements... . 


The Most Probable Value of the Measured Quantity . 
The Mean Square Error of the Weighted Mean .... 
The Most Probable Value of the Mean Square Error of 
a Measurement of Unit Weight....... 

The Procedure for Analyzing Unequally Precise ‘Meas- 
urements of a Fixed Quantity, An Example ,....... 


Chapter 16. Determination of Several Unknowns in Equations 


78, 


by the Method of Least Squares.........e.-. 
Conditional and Normal Equations. Legendre’s Prin- 
Ciple .. ce ec ee ee we rw cee rec ene cee. 
The Probabilistic Meaning of ‘Legendre’ s Principle . 


. Generalization of Legendre’s Principle to Non-Equally- 


Precise to Equally Precise Equations ...... . 
The Reduction of Nonlinear Conditional Equations to 
Linear Form .......2ceee00- cece cee eee ee 
Linear Conditional and Normal Equations. 


. A Check on the Setting Up of the Normal Equations oe 


The Solution of a System of Linear Normal Equations 
Calculation of the Weights of the Unknowns ....... 
The Approximate Value of the Mean Square Error Per 
Unit Weight. The Mean Square Error of the Unknowns 


87, An Example Illustrating the Procedure for Solving Sys- 
tems of Linear Conditional Equations ........... 
Chapter 17. Empirical Formulas ........ ceo tere eee ee 
88, Statement of the Problem ........ Cece ec ce we ‘ 
89. Choice of the Type of Formula............ 
90. The Use of Legendre’s Principle for Determining the 
Values of the Parameters...... cece er ce eee ee 
91, Checking of Empirical Formulas ........ . cee 
92. An Illustration of the Derivation of an Empirical Form- 


ula... cee ee ee ccc eee 


212 
215 


216 


219 


221 


224 
224 
226 
228 
229 


233 


236 


236 
239 


241 
243 
247 
251 
254 
260 
266 
269 
273 
273 
274 


277 
279 


280 


Contents 


Part V 
Analysis of Statistical Material 


Chapter 18. Analysis of a One-Dimensional Statistical Set .. 


93. 
94, 


Statistical Sets .... cc tc ee cw te ee te cee . 
A Discrete Empirical Distribution and its Numerical 
Characteristics, .... cc cc ee ec ee ec cee ee cece 
Continuous Empirical Distributions ............ 
Comparison of the Empirical and the Theoretical Dis- 
tributions... 1... cee eee we ccc ccc cc cc cee ee ‘ 
Confidence Probabilities and Confidence Limits. 


. Graphical Representation of an Empirical Set...... 


The Average Errors of the Parameters of a Sample 
Set e @ eeeeeee@e @8@ @ @ @ oeueee@ oeeeeee oeu4ee@e¢ - @6h6UhrmhUClhUhOmhlhlhM! 


Chapter 19. Elementary Theory of the Correlation of Two 


100. 
101, 


102. 


103. 
104, 
105. 


106. 


107, 
108. 


109. 


Variables .. 0... cc ee cece ee cee rt cee eee 
The Empirical Distribution of Two Random Variables 
Correlational Dependence, Problems in the Theory of 
Correlation .... cs cece esccccccces ce cee 
Derivation of a Linear Empirical Formula...... oe 
Derivation of the Linear Equations of Regression ... 
The Correlation Coefficient. ..... ‘ cee 
The Average Errors in the Equations of Regression. 
Bounds for the Values of the Correlation Coefficient 
Average Errors of Sample Coefficients of Correlation 
and Regression ..... cc ccc cer eecsessene 
The Probabilistic Significance of the Elementary 
Theory of Correlation ........ ccc vcevcceee 
The Procedure for Investigating the Correlation in the 
Case of a Large Number of Observations, An Example 
An Example of Investigating the Correlation from a 
Small Number of Observations..... a a arr 


Bibliography ......2.2ccccccccerccceves ce cee eee ee 


Appendix ...... ccc ee eee ee ccc cee eee eee cee 


PREFACE 


This book was written as a text for courses in ‘‘Mathematical 
Analysis of Observations’’ for astronomy students in the applied 
mathematics and physics divisions of Universities. The book is 
based on lectures given at the Moscow State University and contains 
the material included in the semester program, as well as supple- 
mentary material not part of the course. 

The subject matter is somewhat more extensive than the title 
would indicate since it includes not only problems connected with the 
processing of observations in a strict sense of the word but also 
problems on approximate calculations. These latter do not always 
involve analysis of observations, although it is convenient to solve 
them in connection with sucha study. For example, the book dis- 
cusses the problem of point interpolation from tables of a function 
whose values are computed from its definition (e.g., by means of a 
series). 

The term ‘‘Mathematical Analysis of Observations’’ is thus in~ 
tended in its broadest sense. 

The problems considered in this book not only are applicable to 
astronomy but quite frequently need to be solved in various other 
branches of science and technology. Therefore, the author hopes 
that at least certain parts of the book will be useful to others besides 
astronomers. 

The book is divided into parts and chapters. The chapters and 
sections are numbered from the beginning of the book. A double 
system of numbering was chosen for the formulas: Formula (16.10), 
for example, is the tenth formula in Chapter 16. 

A small bibliography appears at the end of the book. It consists 
mostly of textbooks, and is divided into five parts, each part apply- 
ing to one of the parts of the present book. 

The author considers it his pleasant duty tothank his colleagues 
in the Department of Celestial Mechanics at Moscow State Uni- 
versity, E. M. Slavtsev and A. 1. Rybakov, for their help in pre- 
paring the manuscript for printing. 


B. M. Shchigolev 


xi 


INTRODUCTION 


Observations and experiments constitute the basis of all natural 
science, Observations and experiments that provide numbers—the 
results of measurements—are of special significance. A proper 
analysis of these numbers leads to a theoretical interpretation of 
the results of the observations and to the final goal of natural 
science, namely, the establishment of laws governing these phenom- 
ena, thus making possible the prediction of the future behavior of 
the phenomena, 

All results of measurements contain errors of various origins. 
Therefore, the results of calculations with numbers, which in turn 
are the results of measurements, also contain errors. Froma 
practical standpoint, it is very important to be able to estimate 
both the errors incurred in making the measurements and the 
errors resulting from operations on those measurements, because 
itis only then that we can safely use the conclusions drawn from 
observations, 

It is no less important to organize the calculations and the ob- 
servations in such a way as to ensure as small an error in the 
result as possible. All the information that we have on linear 
dimensions in the solar system, in the galaxy, etc. are based, in 
the last analysis, on direct measurements made on comparatively 
small quantities on the surface of the earth. These values contain 
errors. In order to obtain information on dimensions in the solar 
system, and (even to a greater extent) in the galaxy, we need to 
multiply these values by large numbers. The errors in measure- 
ments are then multiplied and this leads to large errors in the re- 
sults. From these brief remarks, it is clear that the analysis of 
the results of observations cannot be carried out in an arbitrary 
manner. In order to have the results contain as small errors as 
possible, we need to develop both methods of estimating the errors 
and methods of computation that will ensure the most accurate 
results possible. 

These remarks have to do with the analysis of observations in a 
narrow sense of the word, referring to operations on numbers that 
are obtained directly from observations. However, in forming a 
theory regarding phenomena and incomputing quantities thatare not 
directly observed but that are derived from the analysis of observa- 
tions, itis necessary to use various mathematical devices; in partic- 
ular, it is necessary to use various functional relations extensively. 


xiii 


XIV Mathematical Analysis of Observations 


As we know, a function can be defined in several different ways. 
In the simplest case, we are told what arithmetic operations must 
be performed on the values of the argument(s) in order to obtain the 
value of the function (a polynomial, a rational function, etc.). Buta 
function may also be defined in such a way that we do not see from 
the definition how to calculate its value (e.g., the arcsine). In such 
cases, the definition is used to find those properties of the function 
that make possible its expansion in an infinite series; this infinite 
series may then be considered a new definition of the function. A 
function may be defined also by an integral, by a differential equa- 
tion, etc. None of these methods of defining the function state 
directly how to calculate the value of the function from the value of 
the argument. In such cases, we need either to find an infinite 
series or to resort to numerical methods of solution that give the 
function in the form of a table. Inthose cases in which a function is 
determined by a convergent infinite series(usually a power series), 
that series is used for compiling a table of values for the function 
(especially if the function is encountered very often in practical 
problems). The compilation of a table is always an approximate 
operation since one must always cut off an infinite series at some 
term or other, Although tabular values of functions are not obtained 
from measurements, they contain unavoidable errors just as do the 
results of measurements. These errors must also be estimated. 
It is also necessary to estimate the errors incurred in performing 
operations on the tabular values of functions. 

Thus, there are common features in the problems connected 
with measurements and in the problems of using tables of functions. 
Both types of problem are directly connected in that quite often the 
result of a measurement is the argument of a tabulated function. 
For these reasons, courses on the analysis of measurements 
usually include problems on operations on tabulated functions. 

Among the errors of measurement a conspicuous place is 
occupied by random errors, that is, errors whose values cannot be 
estimated before the observations. We might also note that they 
cannot be evaluated even after observations, since the presence of 
random errors makes it impossible for us to determine the exact 
value of the quantity measured. In analyzing measurements con- 
taining random errors, one must use the theory of probability, 
which is also necessary in statistical work. This course in the 
mathematical analysis of observations includes the basic principles 
of probability theory so that the student may use a single textbook 
and have a single system of notation, terminology, etc. 

Problems involving the analysis of observations and the some- 
what allied problems of approximating the value ofa function arose 
long ago, primarily in connection with astronomical problems. 
They were first stated in a completely explicit form in the works of 
the French mathematician Legendre (1752-1833) and the German 
mathematician Gauss (1777-1855). 

The Russian mathematicians P. L. Chebyshev (1821-1894), 
A. A. Markov (1856-1922), and A. M. Lyapunov (1857-1918) played 
a special part in the development of the theory of errors and the 


Introduction XV 


theory of approximation of a function. The founders of the Soviet 
school of probability theory and of the constructive theory of 
functions, namely, S. N. Bernstein, B. V. Gnedenko, V. L. Gon- 
charov, A. N. Kolmogorov, V. I. Romanovskii, A. Ya. Khinchin, and 
others, have conducted and are still conducting interesting research 
in these fields. 


Part I 


OPERATIONS WITH APPROXIMATE 
NUMBERS 


Chapter 1 


ESTIMATION OF ERRORS OF 
APPROXIMATE NUMBERS 


1. FUNDAMENTAL PROBLEMS IN THE THEORY OF 
APPROXIMATE CALCULATIONS 


In the natural sciences, one has occasion to deal with exact 
numbers only very rarely. If an astronomer is investigating 
motions in a three-star system, the number 3 is, of course, exact. 
When a physicist is studying the structure of snowflakes, he can 
determine exactly the number of rays on each individual snowflake. 
However, the number of such examples is very limited. 

The results of measurements are always approximate, primarily 
because of the limited accuracy of measuring instruments. 

Every measuring instrument has a scale and the intervals be- 
tween the dividing lines on this scale cannot be arbitrarily small. 
We sometimes speak of the ‘‘threshold of sensitivity’? of the in- 
strument. This isthe smallest change in value that can be registered 
by the instrument. For example, if a circle designed for measuring 
angles has dividing lines every 10’ and a vernier giving the in- 
dividual minutes, the thre shold of sensitivity of the instrument is 1’, 
since a change in angle of 1 can be detected by the instrument, 
while a smaller change, though it can be perceived, cannot be ac- 
curately determined. If the needle of the instrument falls between 
two divisions, the tenth parts of the interval can still be evaluated 
either by the eye or with the help of the vernier. Thus, we may 
assert only that the errors in measurement are less than one tenth 
of the interval between the divisions of the scale. This assertion 
will be valid if there are no other sources of error besides the 
error caused by the limited accuracy of the measuring device. 

There are a good many measurements, especially laboratory 
measurements, with regard to which we can find an upper bound on 
the absolute value of the error of measurement. We shall examine 
such cases below. Furthermore (quite apart from the question of 
the possibility of determining this upper bound accurately), it is 
always possible to assume the existence of thisupper bound. Indeed, 
since an exact numerical value of the quantity measured exists as 
an objective reality independent of us, and since measurements 
give, generally speaking, some other value, the error is always 


3 


4 Mathematical Analysis of Observations 


bounded. In this part of the book, we shall assume that we are 
dealing with approximate numbers containing errors of arbitrary 
origin but for which we can find an upper bound on the absolute 
value of the error. 

The first problem in the theory of approximate calculations 
will be to establish means of estimating errors. 

We shall almost always need to perform arithmetic or other 
operations on approximate numbers. Of course, the results of 
these operations will also be of an approximate nature. We then 
encounter the second problem, that of estimating the error in- 
curred in performing these operations if we already have estimates 
on the errors in the original numbers. We may call this problem 
the direct problem of the theory. If such a problem is solved in 
literal form, we can then set uptwo other very important problems: 

(1) the inverse problem, that of determining the degree of ac- 
curacy in the original numbers that is necessary to ensure a 
specified degree of accuracy in the result of the operations per- 
formed; 

(2) determining the conditions of measurements or calculations 
under which the error in the result of the operations will be as 
small as possible. We may vary either the choice of circum- 
stances under which the measurements are made or the choice of 
formulas used in making calculations. 


2. THE EXACT ERROR OF AN APPROXIMATE NUMBER 


Let us suppose that a certain quantity (for example, an angle) 
has a definite numerical value A that remains unchanged during 
the entire process of measurement. Let us suppose also that the 
measurement made of this quantity yields the value a. 

The difference between the exact and approximate values 


A—a=A, (1.1) 


is called the exact error of the approximate number a. This 
definition is convenient in that the concept of an exact error coin- 
cides with the concept of a correction since 


A=a-+A,, (1.2) 


that is, the exact error is the number that must be added to the 
approximate number in order to obtain the exact value.* 


*Instead of an exact error, the concept of an absolute error is sometimes used, The 
absolute error |A|, of an approximate number a is the absolute yalue of the difference 
between the exact and approximate values of the quantity: 

|Alg=iA—al. 


We shall not be using this concept, 


Estimation of Errors of Approximate Numbers 5 


The concept of an ‘‘exact error’’ is only of theoretical signifi- 
cance, since in the more common problems this error cannot in 
actuality be determined. We might only mentioncertain exceptional 
cases in which we wish to investigate the accuracy of measure- 
ments made by some technique or with the help of some instrument, 
and, to do so, we measure the same quantities in some other way 
(for example, with precision instruments) with an accuracy that is 
considerably greater than the accuracy of the technique or device 
for measuring that we are investigating. Althoughthe more precise 
measurement also contains anerror, we may assume that the second 
measurement yields a formally exact value and determine the 
‘exact’? error of the first measurement, confining ourselves to a 
certain number of significant figures. 

Suppose, for example, that a measurement with a theodolite 
gives a value of 38° 43 with an accuracy of 1’, Suppose that we 
measured the Same angle with a universal compass with a maxi- 
mum error of 5 and that the result of this measurement is 38° 43" 
2). Formally considering the second value as exact, we may say 
that the exact error of the first measurement is 0.4 although in 
actuality this is only an approximate value of the exact error. 

In what follows, we shall assume that the quantity |A,| is so 
small in comparison with |a! that we can neglect powers of A, 
higher than the first. If we are considering several approximate 
numbersa, 0,... at the same time, weshallalso neglect products 
of the form 4,A,. 


3. LIMITING ABSOLUTE ERROR 


As was shown in section 1, it is possible in exceptional cases to 
give an approximate value of the exact error, in the sense that the 
error of this determination is considerably less in absolute value 
than the error of the basic measurement. In many cases, it is 
easier to determine an upper bound on the absolute value of the 
exact error, 

For an approximate number a, consider the smallest positive 
number «, that contains one or two significant figures and that is at 
least equal to the absolute value of the exact error. In other 
words, 


&, > | Aa|- (1.3) 


We shall call this number «, the limiting absolute error. 

This definition requires some explanation. For the most part, 
it is not difficult to find a positive number that is known to be 
greater than the absolute value of the exact error, and we could 
take such a number as the limiting absolute error. If several such 
numbers can be found, the smallest of them must, by definition, be 
chosen. It is sometimes convenient to replace a value accepted as 
the limiting absolute error with another, simpler one, that is, one 
that contains fewer significant figures. In such cases, we can only 
increase the original value that was used. 


6 Mathematical Analysis of Observations 


If the limiting absolute error is known, we may write the obvious 
inequality* 


a—eg cK A<ate, (1.4) 


for the exact value of A. 

Let us consider the particular case in whichthe limiting absolute 
error of a number is determined by its decimal representation. 
Every approximate number, when written in this form, has a 
limited number of significant figures, depending on the accuracy of 
measurement or computation. For example, suppose that a length 
is measured by means of a ruler with dividing lines at each milli- 
meter and a vernier that gives the tenth parts of a millimeter. 
Then, the length will contain whole centimeters, and the tenth and 
hundredth parts of a centimeter. From the nature of the process 
of such a measurement, we may assert that the absolute value of 
the error in this measurement is less than 0.01 cm if (as is usually 
the case) in using the vernier, we always observe a smaller (or 
larger) division of the vernier when the end of the interval does 
not coincide exactly with any division of the vernier. With such a 
measuring instrument, we can always give the sign of the exact 
error because the approximate value is known tobe less (or, in the 
corresponding case, greater) than the exact value. This means of 
measurement can be called measurement without rounding off in 
analogy with the corresponding computational operation. 

Measurement with rounding off in this example would consist in 
seeking the division of the vernier that is closest to the end of the 
segment that is being measured and in writing the corresponding 
number of hundredth parts. The exact error is equal to the distance 
trom the division taken on the vernier tothe end of the segment. In 
the case of a measurement with rounding off, the absolute value of 
this distance does not exceed one half the distance between the 
divisions of the vernier, the latter distance being equal to 0.01 cm. 

Thus, if the result of the measurement indicates only hundredth 
parts of a centimeter andthe measurement is taken without rounding 
off, the limiting absolute error of the measurement should be taken 
as 0.01 centimeters. On the other hand if there is rounding off of 
the measurements, the limiting absolute error should be taken as 
one half this amount. These conclusions are valid for arbitrary 
approximate numbers. 

By generally accepted convention, the limiting absolute error 
of a number written in the decimal system of notation is taken 
equal to the unit corresponding to the last significant figure on the 


*We call the reader’s attention to the fact that the conditional notation that is some- 
times suggested 


A=ateg 


is not advised because a similar notation has long been used in the theory of random 
errors and there it has a different meaning, 


Estimation of Errors of Approximate Numbers 7 


right (in other words, equal to 10 taken to the power indicated by 
this last significant figure) if the number is taken without rounding 
off and equal to half this unit if the number is taken with rounding 
off. By ‘‘last,’? we mean the significant figure farthest to the right. 
If we have other information on the limiting absolute error, it must 
be stated, and when operations are performed on the numbers in 
question, we must be guided by this information rather than by 
convention. 

This convention is used not only in cases in which the approxi- 
mate number is obtained as the result of measurement but also 
in those cases in which it is obtained from calculation and is the 
result of discarding some number of significant figures on the 
right. The simplest example of this type is offered by the con- 
version of a simple fraction into a decimal fraction with the re- 
tention of only a certain number of digits to the right of the decimal 
point. 

If digits are simply discarded (as in the case of a decimal 
fraction) or are replaced by zeros (as in the case of a whole num- 
ber), then on the basis of the rule that was stated above, the 
limiting absolute error is taken to be the unit of the last digit that 
is kept. By ‘‘last,’’ we mean the digit that has nothing to the right 
of it in the case of a decimal fraction and only zeros in the case of 
a whole number.* When we simply discard digits, we obtain an 
approximate value with an accuracy up to the unit of the last 
digit kept. This number will be less than the original number. 
We may possibly agree in such cases to increase the last sig- 
nificant figure by unity, in which case the limiting absolute error 
will still be equal to the unit of the last significant figure but the 
approximate value obtained will be greater than the original 
number. 


Example 1, As we know, we may take =3,1416, Suppose that in organized calcula~ 
tions it is sufficient to take two digits to the right of the decimal, Then, we would take 
x=3,14 (or * =3,15), so. that e, =0,01. If we do not know how the approximating value 
of this number was obtainec, we can only assert that 


3.13< 7<3,15, 


On the other hand, if it is known that the number has a remainder (that is, that the exact 
value is greater than the approximate value), the lower bound will be 3,14, 


*To avoid the possibility of confusion, we note the following about the method of de- 
fining the limiting error, Let us illustrate with a simple example, Suppose that in 
measuring a length with a precision instrument, we arrive at a result of 3,5003 cm, If we 
only need an accuracy of three decimal places, we should take the length as equal to 
3,500 cm and not discard any of the zeros (in contrast with the rules of exact arithmetic), 
The reason is that, from what wag agreed upon above with regard to our notation, the 
limiting error is equal to (1/2) + 10-2cm, but if we discarded one of the zeros, this would 
indicate that it is equal to (1/2) - 107“ cm, In this case, even the zeros to the right of the 
decimal point are significant figures, 

Suppose that the same lengthis measured with a ruler graduated in millimeters, Then, 
we would obtain 3,5 cm with a limiting error of (1/2) ° 107 1 om, According to the formal 
rules of arithmetic, we might also write this as 3,500 cm, but this would change the esti- 
mate of the error, 


8 Mathematical Analysis of Observations 


Example 2, The equatorial radius of the earth Ris approximately equal to 6,384 km, 
Suppose that we wish to replace this number with a number containing only two significant 
figures, if we discard those digits that we consider superfluous, we may take k —6,300 
km with a remainder or R = 6,400 km with a surplus, In both cases,¢ p = 100 km, 


A simple discarding of digits is convenient in that the sign of 
the error is known, but when we do this we do not get an approxi- 
mate value that is as close asit might be to the actual value. From 
this point of view it is more convenient to round off the approximate 
number in a manner equivalent to the rounding off in the case of a 
measurement, The rule for rounding a number off is generally 
known and we do not need to explain it: if the first discarded digit 
is less than 5, the last undiscarded digit is kept unchanged; if the 
first discarded digit is greater than 5, the last undiscarded digit is 
increased by 1. If only the digit 5 is discarded and the following 
digits are unknown, the most common convention is to keep the 
last undiscarded digit unchanged if it is even and to increase it by 
1 if it is odd. This is rounding the number off to the nearest even 
digit. The error resulting from this rounding off is bounded by the 
following rule: the limiting absolute error of an approximation 
resulting from rounding off is equal to one half the unit of the last 
digit. 


Example 1, For the number 7, we obtain the approximate value x = 3,14 in rounding 
the number off to two decimal places, Therefore, 


e. =+ 0.01. 


Example 2, For the equatorial radius of the earth, we have 
R= 6400 km, ep=50 km, 


A certain inconvenience with regard to rounding a figure off consists in the fact that 
the sign of the error of the approximate number is not known until we obtain the approxi- 
mate number, In the case of the following operations, where only the approximate number 
is given and it is known that it is obtained by rounding off, we may only assert that 


a—tgoAca-+e,, a= «107, 


where 10? is the unit of the first significant figure on the right, 


4. LIMITING RELATIVE ERROR 


The bound that we gave above by means of the limiting absolute 
error makes it possible to exhibit bounds between which the exact 
value of a number lies but it does not sufficiently characterize the 
quality of the measurements made if the approximate number is 
the result of measurements or calculations drawn from the results 
of measurements. To be able to consider a measurement or a 
calculation satisfactory, we need to know not only the smallness of 
e, alone but also the smallness of «, in comparison with a. To 
clarify this, let us set 


Estimation of Errors of Approximate Numbers 9 
eg = 8-jal, 
where 4 is a positive number. Then, we may write 
a(l—s8<A<a(l+d), ifa>oO, 
and 


a(li+t%si<A<a(l—d), ifa< 0. 


It is clear from these inequalities that an approximate value a 
does not determine even the sign of the number A when 3> 1 since 
the right and left sides ofthe inequalities in this case have different 
Signs and the actual value can be either positive or negative. From 
this, it is clearthatto characterize the quality of the approximation, 
it is important to know not just the magnitude of the limiting abso- 
lute error but also its relationship to the quantity a. A determina- 
tion of the equatorial radius of the earth with «, = 1 m would be 
considered excellent, but a measurement of the dimensions of even 
a large auditorium with the same limiting absolute error would be 
poor. This example brings up another consideration concerning 
which we need to introduce an additional type of error estimate. In 
most problems, a and «, are denominate numbers and therefore, 
the numerical value of «, depends on the choice of units of meas- 
urement; that is, it is not a sufficiently complete characteristic of 
the inaccuracy. Also, the values of the limiting absolute errors of 
different measurements cannot be compared in those cases in 
which quantities measured in different dimensions, such as weight 
and length, are measured. 

In connection with these considerations, itis convenientto intro- 
duce the concept of a limiting relative error. The limiting relative 
error 6, of an approximate number a is defined as the ratio of its 
limiting absolute error «, to the absolute value of the number a:* 


i=. (1.9) 


If we know e,, we can find 5, and vice versa. 

Quite frequently, the quantity 8 is expressed in percentages 
or in parts per thousand (indicated by [0/00]). In computing the 
limiting relative error, we try to obtain a number with only a few 
significant figures. As a rule, the calculation is made in one’s 
head and 8 is simplified in such a way that the number obtained is 


*It would be more logical to call the ratio of ¢, to| A|the limiting relative error, But 
in practice, we cannot determine this value since we do not know the value of A, Let us 
consider the two quantities 


10 Mathematical Analysis of Observations 


greater than what it must be according to the definition. For ex- 
ample, if x= 3.14 with rounding off, then 


0.5- 107? 
3.14. 


—3 


1 -3 _ l -3 l 
E =75 ° 10 ’ = < |: (ilo <= 


For the limiting relative error, we take 


$= 2- 107° = 2%Jy. 


5. ESTIMATE OF AN ERROR FROM THE NUMBER 
OF KNOWN DIGITS 


In practice, the relation between the number of significant figures 
that are known with certainty in an approximate number and the 
limiting relative error is used extensively. 

Suppose that a positive approximate number a Contains s 
definitely known digits. Then, its decimal expansion is of the 
form 


=n,- 10° +n,- 107° + ... +n, 107, 


where n,, m,...n, are digits in the decimal representation with 
n,# 0. The integers r and p (with r >p) and the positive integer s 
are related by the obvious equation p—r=1— s. 

Suppose that the number a is obtained with rounding off. Then, 


Assuming for simplicity that a and A are both positive and using the definition of the 
exact error, we obtain 


Remembering that 1Pal < bq, we obtain 
/ 
|e — ba] <a, fo, +%t+...]. 


Thus or — is a quantity of the second order of, magnitude if 5 2 is of the first order, 
Consequently, Which of the two quantities 8, and oo is used to characterize the relative 
error of the number a is in practice of no signifi cance, 


Estimation cof Errors of Approximate Numbers it 
From the definition of limiting relative error (1.5), we have 


1 10 
bq =— oO ° A! e 

In the expression for 6,, let us now replace a with its decimal 
expansion keeping only the first term. Obviously, this can only in- 
crease the value of 3: 


This last expression can be taken for the limiting relative error 


35 = im 107°, (1.6) 
Here s is the number of definitely known significant figures in the 
approximate number and 2“, is the first digit of the number. We 
Should note that in accordance with the formula that we have ob- 
tained, the limiting relative error depends only on the number of 
known digits and 1,, but not on the position of the decimal point. 

We give a table of values of 6, corresponding to different values 
of n, and s (with 6, expressed in percent): 


Peer rrees 
© 


1 
2 
3 
4 
9) 
6 
7 
8 
9 


Inn woowna 


AAD > 


0.0056 0.00056 


To get an approximate evaluation of the relative error, we may 
take the average value of the first digit; that is, we may set 
n,== 5. Then, the limiting relative error of a number with one 
definitely known digit is a number of the order of ten percent; in 
the case of two definitely known digits, itis one percent; in the case 
of three, it is 0.1 percent, etc. We should note that in many applied 
problems, a limiting relative error of the order of a tenth of a 
percent is sufficient. It is enough to carry out such calculations 
with three significant figures. 

We have just determined the limiting relative error from the 
number of known digits. It is easy to solve the inverse problem 
also, namely, the problem of determining the necessary number 


12 Mathematical Analysis of Observations 


of known digits for a given limiting relative error. Suppose that we 
need to determine s in such a way that é,— 107’. Then, from 
formula (1.6), 


ma 107° < 107%, = 10% 10". =, to 10%. Je. 
If we replace n, with the average number 5 (where 7, can take any 
integral value between 1 and 9), we get 10°> 10’ ands>q. Conse- 
quently, on the average, the number of known digits must be equal to 
the absolute value of the power of 10 in the given value of 6,. For 
example, if we need to find 6, of an order of one percent, the num- 
ber must have no fewer than two definitely known digits. 

Let us obtain a more reliable estimate than the average. We 
begin with the decimal expansion of the number a: 


a=n,-10°-+n,-10°""+ ... 4+-n,-10?. 


Then, 
(n, + 1)-10° >ap>n,- 10, 
Hence, 
r 1-10? | 10? 10 -2 
Soe eR E _C—#'(Q) 
a ae a> 2 (m+1)10" = 2(n, +1) 


if a is an approximate number obtained upon rounding off. If it is 
given that §,= 107%, we need to take s such that 8, will be less than 
84.3 that is, 


109+! 


10> oF 


(1.7) 


For the necessary number of known digits, we need to take the 
smallest integer s that satisfies this inequality. We can get some- 
thing like the average value if we take n, = 4. Then, s>q. 

The table below, giving the number of known digits for a given 
relative error, was compiled from formula (1.7): 


This table can be replaced with the simple formula: if the first 
digit does not exceed 3, the number of known digits must exceed 
by 1 the absolute value of the power of 10 in the given relative 


Estimation of Errors of Approximate Numbers 13 


error. In the remaining cases, these numbers are equal. The value 
of 0 for q= 0 and n,= 4-9 means that there will be a 100-percent 
error if we do not know a Single digit in the number for certain but 
only that the first digit is less than 4. To see this, suppose that 
the exact value of a one-digit number is 5. If an error of 100 per- 
cent is allowed for the number, the absolute value of the error can 
attain the value of 5 andthe approximate number can have any value 
from 0 to 10; that is, the first digit will not be known. 


Chapter 2 


ERRORS INCURRED IN FUNDAMENTAL 
ARITHMETIC OPERATIONS 


In this chapter, we shall consider the following problem: if the 
limiting absolute and relative errors of numbers upon which the 
operations of addition, subtraction, multiplication, and division are 
performed are known, what error will exist in the result? 


6. ADDITION 


Suppose that 

u=a,+a,+... +4,, (2.1) 
where a,, @,..., @, are approximate numbers, either positive or 
negative or both. Let us denote their limiting absolute errors by 
G1, Eng oo oy €,6 AlSO, let us denote by A,, 4,,..., A, the exact 


errors in the individual addends and by A, the exact error in z. 
Obviously, 


A,==A,+4,+ ... +A,, (2.2) 


and hence 


JAul< 3 Ay]. (2.3) 


k = 


~— 


Since by definition (see (1.3)), 


lAnl<e, A=l, 2,..., 2, 


we have 


Errors Incurred in Fundamental Arithmetic Operations 15 


Therefore, ¢, can be determined from the equation 
ey = » Ene (2.4) 


Thus, for the limiting absolute error of the sum, we may take the 
sum of the individual absolute errors in the addends. 


Example, Let us set u=3,14 + 0,843 + 0,0365, Let us suppose that the approximating 
addends are given with accuracy up to the unit of the last digit, Then, 


€,, = 0,01 + 0,001 + 0,0001 = 0.0111 
If we need to write this estimate more simply, we should take 
245 = 0,02, 


From this example, it is clearthatthereis no sense in trying to 
obtain approximate addends with different numbers of digits to the 
right of the decimal point. According to the value obtained for «,, in 
this example, the error can exceed one hundredth and, therefore, 
there is no sense in writing the thousandth and ten-thousandth parts 
in the result, and in fact we cannot vouch for the accuracy even of 
the hundredth part. In adding approximate numbers, we may pro- 
ceed in two ways. 

The first consists in ‘‘trimming’’ all the terms to be added 
to fit the least exact one. Since the limiting absolute errors of 
the terms will be equal, the limiting absolute error of the sum 
will be equal to the product of the limiting absolute error of 
a single term and the number of terms. However, it immedi- 
ately follows from this that in the sum we lose exactly one digit 
after the decimal if the number of terms isa one-digit num- 
ber, two digits if the number of terms is a two-digit number, 
etc. 

However, it should be noted that this estimate of the error is 
suitable only when there are only a few terms to be added. If there 
are twenty or fifty terms, for example, the estimate of the error 
can be quite excessive since with this method it is actually as- 
sumed that all the approximate values for the terms have the 
largest errors in absolute value and that all are of the same sign. 
Such a case can, of course, be encountered in calculation, but it is 
very rare. As a rule, the numbers 4,have different signs and these 
errors partially cancel each other out in the sum of a large num- 
ber of terms. Therefore, the actual error in the sum of a large 
number of terms can be considerably less than the number that 
would be obtained from assigning each term the maximum error 
(see next section). 

The second method of adding numbers with different numbers of 
digits to the right of the decimal point consists in adding separately 
by groups those numbers with the same number of digits to the 
right of the decimal point place and then rounding off the sums to 
the smallest number of digits to the right of the decimal point. 


16 Mathematical Analysis of Observations 


Example, 


u = 3.14 + (0.847 + 0.936) +- (0.0746 -+ 0.0358) = 
= 3.14 + 1.783 +0.1104 = 3.14 4+ 1.783 + 0.110 = 
= 3.14 + 1,893 ey 3.14 + 1.89 = 5.03. 


If two terms are given with accuracy up to half of the unit of the last digit, calculation 
of the error in the result in this example takes the following form: The limlting error of 
the sum of the fourth and fifth terms is equal to 2°0,00005 = 0,0001, After we round this 
sum off to three digits to the right of the decimal, we add the limiting error 0,0005, We 
then have In all 0,0006, Addition of the third and fourth terms in the preceding sum gives 
a limiting error of 2-0,0005 + 0.0006 = 0,0016, Discarding the third digit to the right of 
the decimal increases the error by 0.005, giving us In all a value of 0.0066, which we may 
replace with 0,007, Addition of the last sum to the first of the terms gives a llmiting 
error 0,005 + 0,007 = 0,012, 

lf we had used the first method, we would have needed to round off all the terms to two 
digits to the right of the decimal point, We would then have obtained 


u = 3,14 + 0,85 + 0.94 + 0,07 + 0,04 = 5,04 
with a limiting absolute error of 
(1/2) * 0,01 ¢ 5 = 0,025. 


The second method can be applied in those cases in which the number of terms Is not 
very small, 


7. STATISTICAL ESTIMATE OF THE ERROR OF A SUM* 


Suppose that 


n 
i= > ap, 
k=1 


where the a; are approximate numbers with exact errorsA,, For each of these numbers, 
we know the limiting absolute error e,;thatis, e, >> |A,|. The exact error of the sum Is 


n 
Ay = > Ax. 
kml 


In order to construct the probable characteristics of the sum u, we need to make 
certain assumptions about the distribution function of the quantitiesA;, In ordinary 
calculations with a definite number of digits to the right of the decimal point, we assume 
that the error of a randomly chosen term is a random number subject to a uniform law 
of distribution In the Intervalfrom- ¢, to «,, According to the preceeding section, the 
accuracy of calculations is only slightly increased if we take terms with equal limiting 
absolute errors, Therefore, we shall assume that all the «; are equal to a single number 
Ce. 

Under these assumptions, the probability density of each of the terms A, will be equal 
to 1/(2e) and the center of distribution will be equal to 0, Therefore, the variance (or 
dispersion) of each of the exact errors is determined by the formula 


+e 
VarA, = f ada = 5 
—e€e 


“This section should be read after completing the third part of the book, 


Errors Incurred in Fundamental Arithmetic Operations 17 


‘Assuming that the exact errors are independent, we then obtain by using the theorem on 
the variance of a sum 


VarA, = 5 ne, 


For a statistical estimate of A,, we stili need to construct the distribution function for 
u. The jaw of distribution for a sum of a finite number of terms is rather tedious to ob- 
tain, but we can manage without it, Letus assume that the sum obeys the normal iaw with 
center 0 and variance n e?/3, From the sigma rule, we may assume 


P(1Au1< ‘ V 2) =088. 


where A,,is the exact error of the sum, and from the three-sigma rule, 


P({4,| <«V3n) = 0.9973. 


The quantity « Y3n can be taken as the {imiting absolute error of the sum of mn terms, If 
we used the maximum iimiting error method, we would obtain ne, 

It is easy to see that ifn>38, the statistical estimate of the error of the sum gives a 
iower iimiting error than the usual estimate, If we add fifty terms together each with an 
accuracy up to 0,005, the exact iimiting error will be 0,25; that is, even the hundredths 
are not reliabie in the sum and the probable error will be 0.005. Y 150 0,06. With a 
probability of 0.9973, we may expect the total error not to exceed 0,06 in absoiute value, * 

Of course, this leaves unanswered the question of the closeness of the distribution of 
the overali error to a normal distribution, which we shall use to determine the prob- 
ability, Let us make the calculations for the simplest case, 

Suppose thatn = 2, From the formula for the distribution of the sum of two addends, 
we have Simpson’s distribution with base from — 2e to + 2e., for which the probability 
density 9 (4,,) is given by the formulas 


1 A 
e (ay) = 3 (1+5-). if —2e<0<0 


and 


1 A 
e(y=5-(1—5-), when O0< A, <2¢. 


(The factor 1/(2e¢ ) in front of the parenthesized expressions is obtained from the condi- 
tion of normalization of the probability density, according to which the area bounded by 
the distribution curve is equal to unity.) 

The variance of the sum is determined by the formula 


2 


= -— ¢2 


Var, A, 5 


— 


*lf we do not know the iaw of distribution of the sum of the errors, we may calculate 
the probabilities from a single variance by using Chebyshev’s inequality: 


P( [bul < to) > 1a a,, = VVard, . 


If we wish to obtain limiting errors with a probability greater than 0,99, thenf—* =0,0i, 
and ¢ = 10, so that 


P(|4y,| < 100,,) > 0.99. 
In our problem, 4,= 6 y/2 . Therefore, in our example, the imiting error is equal 
to 10+0,005 y ® az0.2 with a probability exceeding 0,99, This result is still some- 


what better than an estimate by using the maximum limiting errors, 
Cc 


18 Mathematical Analysis of Observations 


(where the subscript 2 on the ‘‘var’’ indicates that the number of terms is equal to 2), 
Let us calculate the probability that the absolute value of the error of the sum would not 
exceed its mean square deviation: 


U 


l A l A? 
+f x(-x)®=a (0-9) yar 
0 _ ~*V F&F 
2 
I (ys —2 Vo /2_ at 
to(*-a), =AsV a-BaV 37 
z 
Pi Aul<e V 2) x06. 


Thus, even in the case of two terms, the probability of an error not exceeding the 
mean square error is equal to 0,65 instead of the 0,68 obtained by the normal law, To 
find the probability of a deviation not exceeding 3s has no meaning here since the limiting 
error in the absolute value is equal to 2< and the tripled mean square error will be 


Be y 2 =eV 6> 24. 


8. SUBTRACTION OF CLOSE NUMBERS 


Subtraction can be considered algebraic addition and in esti- 
mating the error of a difference we may use the result obtained 
in Section 6. If 


u=—a—Qbd, (2.5) 
then 
Ey = Eat ey 
~ __ fact fh (2.6) 
ad lu, 


In subtracting, we sometimes encounter cases presenting diffi- 
culties in the matter of computation. Suppose that 06 differs only 
slightly from «. Since a and 6 have a finite number of significant 
figures, there will be few significant figures in the difference, and 
this means that the relative error of the difference will be great. 
A special difficulty arises in the case in which the result of such 
a subtraction will be used in subsequent calculations. For example, 
if there are only two digits known with certainty in the minuend 
and subtrahend, we may not have more than one digit in the result 
of an operation between this number and other numbers; at best, 
there will be only two certain digits. In such cases, we say that 
accuracy is lost since no increase in the accuracy of the other 
numbers (besides a and 0) can correct the result. 


Errors Incurred in Fundamental Arithmetic Operations 19 


There are two ways out of this situation. First, we may try to 
increase in a significant manner the number of digits known with 
certainty in the numbers a and 0 in order to have the number of 
known digits needed in the result. However, it may not be possible 
to increase the accuracy of a and } very much (as, for example, 
when a and 6 are obtained from observations). Then, we try to 
rewrite the computational formulas in such a way as to remove the 
difference between close numbers. 

If the minuend is only slightly different from the subtrahend, 
both these numbers can be represented in the form 


a==m—+a, b=m-+6, 


where a and § are numbers that differ from each other by a large 
amount. The simplest transformation of the formulas consists in 
trying to remove the common part m by a transformation of the 
formulas such that the calculation is reduced to the following: 


u=a—s. 


If this technique is not possible either, we shall need a more 
complicated transformation of the formulas. 


Example, Suppose that we need to compute the left side of the formula 


3 3 


(ry +re +s)? (4 tre—s)? =e 


(a well-known formula of Euler expressing the relationship between two radius vectors 
of a parabola, the chord length, and the time), In ordinary problems, s is a small number 
in comparison with [1+ >, Therefore, a direct calculation by using this formula leads 
to a loss of accuracy, We may make the following transformation: 


: 3 Z 
snectn [( 4a) (ata) 


By hypothesis, the fraction s /(7,-++ rs )is smallincomparison with unity, Therefore, the 
expressions in the parentheses can be expanded in a binomial series, 
When this is done, we obtain 


S 


3 
> 1] 
wa(ri tre)? (%—Zert .), 0 Fre” 


This transformation eliminated the common part in the minuend and subtrahend (unity), 
which made it possible to increase the accuracy of the remainder, Suppose, for example, 
that 74+ 7r.o= 2,000 and that s= 0.01423 (note that these numbers are given with four 
definitely known digits) A direct calculation will give u = 0,060 (when we make the 
calculation with 4-place tables of logarithms); that is, the result contains only two 
definitely known digits although the original numbers contained four, However, if we 
make the calculation by using the transformed formula, we obtain 0.06031 with a doubtful 
last digit,* 


*We might note that in this example it would have been possible to get rid of the close 
difference in a simpler manner, If we multiply and divide the left side of the original 
formula by the sum 


3 3 


(ry tere +s)? 4(r; tre —s)3, 


20 Mathematical Analysis of Observations 
9. MULTIPLICATION 
Suppose that 
u=a-b, (2.7) 

Let us determine ¢«, and 5,, assuming that we know «, and «,. To 
simplify the calculations, let us assume that a and 6 are both 
positive. The exact product will be 

U =A. B=(a+A,)(6-+45) (2.8) 
or 

U =u-+tad, + 0A, + 4yAg- 


Assuming that A, and A, are small in comparison with a and 5, we 
discard the product 4,4, as a second-order infinitesimal. Then, 


U—u=ah, +b, (2.9) 

and 
|U — | <a] Ay] +6] dal: (2.10) 
If we replace the absolute values of the exact errors with the 
limiting absolute errors, we shall obtain an upper bound on the 


absolute value of the exact error, which we may take as the 
limiting absolute error of the product of the two factors 


€,, = ae, + be,. (2.11) 


This formula is easily extended to the product of an arbitrary 
number of factors. If 


U=A,0,... As, (2,12) 


Ey SH Mg oe Ag_y&g F- Ayn «-- Ag _oAg&g_y + +» Ogg... Age. (2,13) 


we obtain 


3 
2(ry-+re)* (30 — 08) 
rr rs 


(10)? +(1—a)? 


However, we note that the method of expansion in a series is more general than the 
particular method that we have just given, 


Errors Incurred in Fundamental Arithmetic Operations 21 


Here, it is assumed that all the factorsare positive. If any of them 
are negative, they should be replaced by their absolute values. 


Example, Suppose that we are determining the circumference of a circle of radius 
R =3,484m, The number =x can be taken with whatever number of definitely known digits 
we may need, Let us take 2n = 6,283, Each of the factors is given with an accuracy up to 
(1/2) 10-8 Therefore, the limiting absolute error is given by the formula 


co = (RF De) 10-°, 


We note that to calculate the error, there is no need to take those values of R and = that 
are directly given since the labor required in evaluating the error can become more 
complicated than the basic calculation, 

Let us substitute 10 for R-} 2x in the expression for &¢ (which is admissible since 
the estimate will become cruder), Then, 


Ee = 510-4, 


that is, we need to limit ourselves to hundredths in the product, It is easy to see that the 
product will contain four definitely known digits, 


The multiplication of approximate numbers can be carried out 
by different methods depending on the desired accuracy of the 
result. In a direct multiplication, there is no need to count all the 
digits that are obtained when the calculationis made. In particular, 
various methods of shortened multiplication that make it possible 
to simplify the labor have been worked out.* Usually, in this 
shortened multiplication, the digit following the last abbreviation 
is determined and it is discarded when the figure is rounded off. 
In our example, we need to consider the thousandth parts, but 
these are discarded after the addition. Thus, 


c= 21.89m. 


If tables of logarithms are used for the calculations, the number 
of digits to be counted is automatically determined by the number of 
digits in the table. 

But today multiplications are often carried out on calculating 
machines. These machines automatically have all the digits that 
may formally be obtained. In the result, the decimal point appears 
in accordance with the familiar rules of arithmetic. A calculation 
of the accuracy is made, and the number of digits after the 
decimal point that can be assumed as certain is determined. Then, 
a second marker is used to denote which digits must be kept. 

Let us now determine the limiting relative error in the product. 
From the general formula for an arbitrary number of factors, we 
obtain 


*See the book by Ya, S, Bezikovich in the bibliography to part | at the end of the 
present book, 


22 Mathematical Analysis of Observations 


By definition, 5, ¢ / a; therefore, 


8 


8, = Dope (2.14) 


kml 


Thus, the limiting relative error of a product is equal to the sum 
of the limiting relative errors of the factors. 

This result is of great significance. Let us recall that the 
limiting relative error is closely related to the number of known 
(definitely known) digits. Therefore, when we multiply, there is no 
sense in taking factors with different numbers of known digits. 
For if, for example, there are three known digits in one factor 
and five in another, the product will have a relative error cor- 
responding to a number of known digits not exceeding three. 
Consequently, in the product, the number of known digits will be 
equal to or less than the smallest number of known digits in any 
one of the various factors. If the number of factors is a one-digit 
number, we may formulate the following rule: the product of a one- 
digit number of factors has as many known digits as does the 
factor with the smallest number of known digits, or possibly one 
fewer. 


Example, Let us calculate the area of a circle of radius 2,37 cm, The area Pls 
equal to the product of three factors of which two have three definitely known digits 
each, Therefore, we need only take the first three digits of x as well, that is, 3.14, The 
relative error of the product is approximately equal to 


_ ~2, 1 -o, 1 9g 2 3 
bpp W724 10 $107? S10 


Here, we used 2 instead of 2,37 and 3 instead of 3,14, (Since these simplifications in- 
crease the limiting error, they are admissible.) By a crude estimate that one can do In 
one’s head, the area will be close to 20, Therefore, the limiting absolute error is equal 
to 1,4°107', Consequently, we should take only the tenths in the product; the last digit 
In this product is not known with certainty. When we calculate the error, we obtain 
P =17,6 cm, Each of the factors had three knowndigits with rounding off and the product 
has two known digits and one uncertain digit with a limiting absolute error of 0,14, 


Let us apply the rule that we have obtained to an integral 
power of a number. If 


= x", (2.15) 


then, 


3 nx” ‘ey, | 
j, -nm» (x >). | (2.16) 


The number of known digits is found just as with ordinary multi- 
plication. In determining <«,, there is, of course, no need to take 
the given value of « with many digits. Ordinarily, we take only 
the first increased by 1. 


Errors Incurred in Fundamental Arithmetic Operations 23 


Example, 


u = 3.4583; e,, = 3.42. — 10-8 


or (increasing this value) ¢€, — 3+10~% It follows from this that we need to take only 
tenths in the product, The result will be accurate up to (1/2)+ 10-1 since «, <5-10~°, 


From this example, it is clear that we can vouch for only three 
digits when the number that is taken to a power has four known 
digits. The calculation will be crude, but in an exact calculation, 
we cannot vouch for the hundredths in the product. This can be 
shown in the following manner: If we wishto calculate with reserve 
digits, then «? = 11.958 and .3= 41.351. Let us now take the cube 
of the bounds between which the exact value of the number lies. The 
cube of the number lies between 3.4575? = 41.33 and 3.4585 = 41.37. 
This calculation shows clearly that the hundredths in the product 
are unreliable but that the error in the hundredths does not exceed 
four units. Therefore, in making calculations, we often introduce 
an extra digit (in comparison with the estimate). In the present 
example, we could have taken the product equal to 41.36, but in 
making subsequent use of this number, we would have needed to 
calculate that its limiting absolute error is not (1/2).10~'but 0.03. 
In the example given above, a convenient order in which to perform 
the calculations on a calculating machine would be as follows: 
First square the number, take four digits (hundredths) from the 
machine, namely, 11.96, multiply by 3.458, and then round the 
figure off to get «. — 41.36. 

Thus, we can take the square or cube of a number and be certain 
that we do not lose more than one known digit. One may take as 
many digits as there are in the number that is being raised to the 
power, but the last digit will be doubtful. Of course, it is not per- 
missible to take more digits in the result than in the original 
number. 


10. DIVISION 
Suppose that 


lL > (2.17) 


and that «, and «, are known. Weneed to determine «,. We have 


yas atite, (2.18) 
i ed aaa 0) 

or 
A xe Cha Ae (2.20) 


24 Mathematical Analysis of Observations 


We introduce the notation 


bA, — aA 
f a b 
4, a 63 


and we find the difference 4A, —A,: 


/ __ (64, — ad,) Ay 
i C7 ae 
This difference is a second-order infinitesimal if A, and A, are 
considered first-order infinitesimals, Therefore, we may take 


__ bAg _ aA, 


A, b2 ’ 


so that 
ja, |< tel eel tr ialieol (2.21) 


Replacing the unknowns |A,| and | 4,| with their maximum values e, 
and e,, we get 


5 a alto tl Pl ee | (2,22) 


u 62 


Example, Let us set a = 5,36, 6 =0.748, ¢, =0,5e10~? and ey =0,510-° To simplify 
the calculation of ¢,,, let us take 6 instead of a and 0,8 instead of § in the numerator and 
0.5 instead of 62in the denominator, Then we obtain 


. —3:107*+4-107° 


—14.1072 
4 55 —1,4-107% 


and finally, ¢,, = 2,107 *(increasing the estimate for simplification), 


Strictly speaking, it follows from this example that we cannot 
vouch for the hundredths in the quotient. In such a case as this, 
the hundredths are taken in the quotient because the error in that 
digit is shown to exceed unity only slightly and the calculation is 
made in accordance with the maximum possible error. However, 
if we should need to carry out further operations on the quotient, 
in calculating the error incurred in performing these other 
operations, we should bear in mind the estimate of the error ob- 
tained, namely, 2-10°°, and we should set the fraction equal to 
7.17 2° 10°), where the limiting error is shown in parentheses.* 

Let us now find the limiting relative error of a quotient. By 
definition, we have 


~ _l@lep+leleg | jal &e a 
oy = 52 Te) = Jal ye? 
8, =6,+5,. 


( 2,23) 


*Sometimes, it is desirable to write the result in the form 


TAS <u < 719, 


Errors Incurred in Fundamental Arithmetic Operations 25 


This result shows that the question of the number of known 
digits in the case of division is solved just as in the case of multi- 
plication. Therefore, the rule given in the preceding section can 
be extended to the set of multiplications and divisions. The same 
statement applies to the uselessness of keeping factors, numerators, 
or denominators with greatly differing numbers of known digits. 


Example 1, 
a = 2.43, b = 0.216, u=s, 
Eg = 0.5-1072, ep = 0.5: 1073, 


To make it possible to estimate the error in one’s head, it is customary to increase 
somewhat the number a in the numerator of the formula for &,, and to decrease the value 
of 5 in the denominator, In the present case, 


3-0.5-1073 + 0.3-0.5- 107? 3 
e4, = 53 — = 75-10 . 


We may take «, = 107% Consequently , we can be sure only of the tenths in the quotient: 
u = 11.2. 
A second calculation can be made from the limiting relative error: 


l - l - l _ 
g= 710 2, by = 7 10 2, bu= > 10 2, 
Since the first digit of the quotient is equal to unity, on the basis of the table at the end of 
Chapter 1, it has three definitely known digits, 

Both conclusions agree with the general rule: since the dividend and the divisor have 
three definitely known digits each, the quotient cannot have more than three definitely 
known digits, 

Example 2, Suppose that @ = 3,144 and that 6 = 0.0536, Let us determine the value of 
u=a/b and let us estimate the error, From the table at the end of Chapter 1, we find 


Bq = 0.017% 0.02%, 8 = 0.10%, 3, = 0.12%. 


Since the first digit of the quotient is 5, we taked, =0.1% (not altogether legitimately), 
We find from the table that there will be three known digits in the quotient, Although we 
obtain in the quotient the same number of known digits as in the original number with the 
fewer number of known digits, it is still significant with such a small divisor that a is 
given with four digits, lf we were to take 4 with only three definitely known digits, we 
would obtain 


by = 0.10%, B45 = 0.10%, 3,, = 0.2%, 


and from the table we could be sure only of two digits, We have u = 58,7, If we take the 
fraction correctly to the nearest half of a hundredth, wu will then be 58,66, 

The control of the fraction in terms of its maximum and minimum yields 58,59<u< 
58,72, which shows clearly the inexactness of the hundredths and even some unreliability 
in the tenths, since the digit in that position can be either 6 or 7 in accordance with these 
bounds, It should be noted that the average of the lower and upper bounds coincides with 
what is obtained in calculating the hundredths, To some extent, this justifies the use of 
the exura digit in cases like this, Ifthe dividend has four digits and the divisor only three, 
we frequently take four digits in the quotient, remembering, of course, that the fourth 


digit is unreliable, 


Chapter 3 


ESTIMATE OF THE ERROR IN A FUNCTION 
WITH APPROXIMATE ARGUMENTS 


11. LIMITING ERRORS OF A FUNCTION OF A 
SINGLE INDEPENDENT VARIABLE 


Suppose that 
U = f(x), (3.1) 


where f(x) is a continuously differentiable function. If, in place of 
the exact value of the argument A, we substitute its approximate 
value a, the value f(a) of the function will also be an approxi- 
mation, Let us find an expression for the limiting absolute error 
of the function ¢«, in terms of the limiting absolute error in the 
approximate value of the argument <«,. The exact value of f(A)can 
be represented in the form 


U=f(A)=f(ata)=f@+f'@rAa+--+ (3.2) 


where A, is the exact error a. Treating A, as an infinitesimal and 
assuming that we may neglect terms containing second- and 
higher-order infinitesimals, we obtain 


A, of’ (ay Au: (3.3) 
hence, 

é, =| f' (2) ea. (3.4) 
Thus, the limiting absolute error of a function of a single argu- 
ment is equal to the product of the absolute value of the derivative 


and the limiting absolute error of the argument, 
For the limiting absolute error, we obtain the formula 


[a] - 8. (3.5) 


Error in a Function with Approximate Arguments 27 


From this formula, it is clear that the limiting relative error is 
proportional to the logarithmic derivative of the function and the 
value of the argument. However, we note that the estimate ob- 
tained in this manner for the error of the function can be too 
crude in some cases, 


Example, Let us evaluate tan 85°, assuming the angle given with an accuracy up to 
0.5° = 0.0087 radians, Then, 


1 
Stal” cos? BSS" 0.0087 = 0.0087 - 132 = 1.15. 


From the tables, tan 85° = 11,4, Therefore, the unknown actual value lies somewhere 
between 11,4 - 115 = 10,25 and 11,4 + 1,15 = 12.55, Let us check this by a direct de- 
termination of the tangents of the bounds 


tan 84,5° = 10,4; tan 85,5° = 12,7 


These bounds can be considered as exact, The same bounds which we established by 
means of the limiting absolute error should coincide with the exact bounds up to three 
significant figures, The significant difference between the bounds, which exceeds the 
admissible error in the calculations, indicates that the limiting absolute error is not 
exact, It is easy to show that in the present case the discarding of terms of second- and 
higher-order infinitesimals in the series was inadmissible for 3-digit calculations, The 
first discarded term in the series exceeds 0,1, and, consequently, even the tenths in the 
error are not reliable, 


A general rule can be given for checking the suitability of a 
method of determining the limiting absolute error of a function. 
If we assume the existence and continuity of the second derivative 
of the function in question, we may use Taylor’s formula to write 


fA-fM=S/ Mat sf" Ode, (3.6) 


where = is a number somewhere between a and a-+-Aa. In ‘‘lin- 


° 


earizing’’ the right side, we incur the error 
l pay 
R=—f" QA. 


From this we obtain the estimate 


IR| <j Myer, (3.7) 


where M,>/f"(x)| for a—e,<x<at+e,. If |R| is a number ex- 
ceeding the unit of the first discarded digit, the ‘‘linearization’’ 
will be crude. We must make the calculation with a smaller 
degree of accuracy or introduce the correction Me, / 2 in the esti- 
mate of the error. 


12. ERRORS IN THE SIMPLER ELEMENTARY FUNCTIONS 


In this section, we shall examine the errors in the basic elementary 
functions. 


28 Mathematical Analysis of Observations 


]. The functions sinxand cosx. 


If «= sin a, from the general formula (3.4), we have 


e,,=|cos a]- ey. (3.8) 
From this it is clear that 
@, <8 (3.9) 
Analogously, we obtain 
Goa =| Sina|e,. (3.10) 


From these expressions, we may derive the following procedure 
for carrying out the calculations. Because of the nature of the sine 
function, we lose little generality in assuming that the angle be- 
longs to the first quadrant. Then, if the values of the angles in the 
formulas are approximate, to reduce the error, we need to take 
the sine of the angle if the angle exceeds 45° and the cosine if the 
angle is less than 45°. This rule should be taken in those cases in 
which it is possible to choose a computed trigonometric function. 


Example, In a number of astronomical problems, we need to make calculations of the 
following type: Suppose thata=rysin§8 andB=rcos 8 are known, Find 8 andr, We 
evaluate tan 8 =a/ 8, From the tangent, we find an approximate value of the angle §, 
To determine the value of r, we may use either of the formulas r=a/sin$ or r =8/cos}. 
Which of these two formulas to use should be determined by the rule stated above, 

Suppose a = 2,364, that £6 = L575, that tan 3 = 1501, and that 8 = 56° 20, 
Since 5 >45°, to determine the value of r, it is advisable to use the sine since the error 
in the sine is less than that in the cosine, The values given or calculated for these 
quantities are as follows: 


a*= rsind | 2.364" 


sind | 0.832 (3) 
B* = rcos 8 | 1.575% 

tans 1.501 (1) 

8  §6° 20’ (2) 

r 2.841 (5) 

cos} 0.554 (4) 

(r) | 2.843. (6) 


The asterisks denote the given values for a and 8 and the numbers in parentheses in- 
dicate the order in which the operations are made, The last two entries give a control 
for the calculations, The last entry is the control value of 7. 


2. The functions tan x and cot x. 


If v= tan a and v= cot a, we have 


(3.11) 


Error in a Function with Approximate Arguments 29 


It is clear from these formulas that 


= 84. (3.12) 


€ >.€,' € 
tan? atid (3) cot? 


If the angle a is close to 0, the error in the tangent is close to its 
minimum and the error in the cotangent is great; consequently, the 
suitability of the formula for « should be examined. The opposite 
conclusions hold when the angle is close to 90°. Accordingly, we 
need to arrange our calculations in such a way as to be able to 
choose formulas containing either the tangent or the cotangent. 


3. Determination of angle from a trigonometric function. 


Suppose that u= arcsin a. Then, 


Sq 


“a Via (3.13) 


We can draw the following conclusions from this formula: 


(a) ¢,>e,. Equality holds when a— 0, that is, when u= 0. 

(b) If we need to determine the angle from the sine, the error 
will be small only if the angle (still assumed to belong to the first 
quadrant) is small. 

(c) It is not advisable to determine the value of the angle from 
the sine for angles close to 90° because the error in the angle will 
be much greater than the error in the sine. For example, if u= 
arcsin 0.984, 


that is, the error in the angle can exceed 10°. This can be seen 
directly from the tables in which, correct to three decimal places, 
sin 79° 40% = 0.984 and sin 79° 50’ = 0.984. 

The same remarks hold with regard to making determinations 
of the angle from the cosine with the difference that it is convenient 
to determine the angle from the cosine when the angle is close to 
90° but not when it is close to 0. 

Suppose that uw = arctan a and v = arccot a. Then, 


e, = —2 ey = ye (3.14) 


It is clear from these formulas that it is most suitable of all to 
determine the angle from the tangent or cotangent because the error 
in the angle is less than the error in the tangent. The most un- 
desirable case is the one in which a small angle is determined 
from the tangent or a large one, Close to 90°, is determined from 
the cotangent, but in even these cases the error in the angle will 


30 Mathematical Analysis of Observations 


be close to the error in the function. Therefore, our computational 
plans should be such as to make our calculations from the tangent 
or cotangent. 


Example, Let us consider a problem that is frequently encountered in astronomical 
calculations, namely, that of determining the equatorial spherical coordinates from the 
rectangular coordinates x, y, and z, From the formulas relating these coordinates, we 
can make the following table of the calculations: 


x*=rcosbcosa 3.1448* 
cosa 0.82509 (3) 

y*=rcos }sing | — 2.1534" 
yix=tana — 0.68475 (1) 
x:cosa=rcoss 3.8115 (4) 
cos 3 0.98906 (7) 

2*=r sind 0.56843* 

z:rcos6= tans 0.14914 (5) 
a | 325° 35’ 54” (2) 
8 8° 28' 58” (6) 
rcosb:cos§=r | 3.8537 (8) 


These calculations involve five significant figures, Therefore, the angles are given 
with an accuracy up to 1% The values with the asterisks are given, The numbers in the 
parentheses indicate the order in which the operations are performed, 


4. Powers and roots. 
If u—a", then 


(3.15) 


___ | lnan-tl 
€,= 2a Ea: 


8, = |2|8,- 


It follows from these simple formulas that if |”|> 1, the relative 
error of the power is greater than the relative error of the original 
number. On the other hand, if 'n|< 1 (that is, if we are taking an 
integral root of a number), the operation decreases the error. If 
the number 7 is of the order of two or three on the one hand or 
1/2 or 1/3 on the other, it follows from the formula for the relative 
error that we can take as many Significant figures in the result as 
in the original number. 


Example, 


m2 —_ 986, x = 1,77, ifwetake «= 3.14, 


n2 — 9.8696. Va=1.7725, if wetake ==3.1416. 


We can agree to consider the values of x°and } z shown with five significant figures as 
exact and the values shown with three significant figures as approximate, Comparison of 
the approximate with the ‘‘exact’’ figures shows in this case that the result contains 
either three known digits (as in the case of the root) or, to be formally rigorous, two 
digits (as in the case of the power) since in rounding off the five-digit value of x3to 
three digits, we need, by the rule of rounding off, to take 9.87, whereas the figure ob- 
tained is 9.86, 


Error in a Function with Approximate Arguments 31 
5. Logarithms and exponentials. 


Suppose that .=loga. Since a=10", we have u=— (log e) In «a, 
where Ina is the natural logarithm and log e= 0.4343 < 0.5. 
Therefore, we need to take 


B= 7D Sy (3.16) 


The limiting relative error %, in the approximate number a does 
not depend on the position of the decimal, as was shown in Section 
5 of Chapter 1, but depends primarily onthe number of known digits 
in eat number. If a is rounded off, according to Section 5, we may 
take 


n 


I] 
°a = 567 jos=1? (3.17) 


where C is the first significant figure and s is the number of 
definitely known digits. It follows from this that 


=o jE (3.18) 
Here, if C > 2.5, 
eu <a (3.19a) 
and if C < 2.5, 
eu > Toy (3.19b) 


On the basis of this, we may formulate the following rule: if the 
first digit of the approximate number is greater than 2, its 
logarithm will have as many known digits after the decimal as 
there are known digits in the number; if the first digit is less than 
3, the last digit of the logarithm, taken with five digits, may not be 
completely reliable: an error of one or two units in the last digit 
is possible (as, when C= 1, we have «,= 2.5-10~°). A somewhat 
simplified rule is usually followed: the number of digits to the 
right of the decimal in a logarithm should be equal to the number 
of definitely known digits in the approximate number. In many 
applied problems of a technological nature, it is sufficient to get a 
relative error corresponding to three or four known digits since 
the original data contains only that many digits. If observations 
yield three digits, the calculations can be performed on a slide 
rule. In the case of 4-digit results of observation, either a calcu- 
lating machine or a 4-digit table of logarithms can be used. When 


32 Mathematical Analysis of Observations 


the number of digits is greater than three or four, calculating 
machines are most often used at the present time. 

It is easy to investigate the error of the inverse operation, 
that is, to determine a number from its logarithm. The result of 
this inverse operation is the exponential function or antilogarithm. 

It follows from the formula for the limiting absolute error of a 
logarithm that 


1 


On = 0.4343 Eu 


(3.20) 


From the definition of a limiting error, the right side can be in- 
creased; therefore, we take 


8, = 2.4¢, ( if u-zloga), (3.21) 


If the logarithm (u) has s digits to the right of the decimal, in the 
case that wis rounded off, 


2 (3.22) 


which means that the number almost always has s known digits. If 
the logarithm is obtained without rounding off, the relative error 
of the antilogarithm will be 2.5 times as great as the limiting 
error of the logarithm. Therefore, the number has either s or 
s— 1 known digits. 


6. Logarithms of trigonometric functions. 
(a) «= log sin a, Then, 


€,, == 0.4343 cot a+ eg. (3.23) 


Let us assume that the angle is in the first quadrant. For the 
limiting error to be less than the limiting error of the angle, it 
will be necessary that cot a< 0.4343 or tan a> 0.4343, in other 
words, that a> 23° 20’. 

From this it is clear that it is not suitable in calculations to 
take the logarithm of the sine of an angle less than 23.5°. It is 
suggested that the reader verify that it is not suitable to take the 
logarithm of the cosine of an angle if the angle is greater than 
66.5°.* If the angle is less than 66.5° but greater than 23.5°, the 
limiting errors of the logarithms of both sine and cosine are less 
than the limiting error in the angle. (We note that the concept of 
‘‘suitability’’ of calculation has to do with just this.) 


*It should be recalled that the limiting error of the angle has to be expressed in 
radians in these error calculations, 


Error in a Function with Approximate Arguments 33 
(b) «= log tan a. Then, 


0.8686 


€,, = 0.43843 cot asect?a- ee. oo” 
“ a sin 2a Ea: 


(3.24) 


In order for the error in the function to be less than the error in 
the angle, it will be necessary that sin 2a> 0.8686, that is, that 
59° 30’ < 2a < 120° 30“or (more crudely) 30°<a< 60°. Since 
log cot a= — log tan a, the estimate of the error for the logarithm 
of the cotangent is the same as for the logarithm of the tangent. 

On the basis of what has been said, we can formulate the follow- 
ing rule for a rational organization of the calculations when using 
logarithms of trigonometric functions. 

If the angle is less than 23.5°, we should take the logarithm of 
the cosine (recasting the formulas if possible). If the angle lies 
between 23.5° and 30°, it is best to take the logarithms of the sine 
and cosine. If the angle lies between 30° and 60°, we may use the 
logarithms of any of the four functions—sine, cosine, tangent, or 
cotangent. If the angle lies between 60° and 66.5°, we should take 
the logarithms of the sine and cosine. Finally, if the angle is 
greater than 66.5°, it is best to take only the logarithm of the sine. 


7. Determination of angles from the logarithms of the 
trigonometric functions. 


(a) If log sinu=a, then sin u— 10° and u= arcsin 10%. Therefore, 


10° ] 


€,, = —_——_—_____— - —— € 
4 V7 — 1922 0.4343 “2 


(3.25a) 


or 


ey = oo ep. (3.25b) 


As in the preceeding sections, we shall take the condition 
é,<<e, as the criterion of suitability of an approximate calculation. 
In the present case, this condition leads tothe inequality u< 23.5°. 
Thus, the calculation of an angle from the logarithm of its sine is 
most satisfactory in the case of angles less than 23.5°. Analogously, 
we may show that calculating an angle from the logarithm of its 
cosine is suitable if the angle is greater than 66.5°. 

In a case where we must calculate an angle from the logarithm 
of its sine or cosine, the following rules are followed: if possible, 
angles less than 45° are calculated from the logarithm of the sine, 
and angles greater than 45° from the logarithm of the cosine. In 
the case of angles between 23.5° and 66.5°, this leads to a greater 
limiting error than the error of the logarithm that is used. In 
particular, if the angle is equal to 45°, the error in the angle will 
be 2.3 times as great as the error in the logarithm of the sine (or 
cosine). 

D 


34 Mathematical Analysis of Observations 


Example, Suppose that log sinu —9,3847 ~ 10, From 4-place tables of the logarithms 
of trigonometric functions, we find thatu =14° 02°, From tables of the trigonometric 
functions, we find that tanu —0,2425, Therefore, 


0.2425 1 _4 -5 
eu = 94343 9 9 = 3-10 ° 
The estimate of the error in the angle is obtained in radians, Conversion to degree 
measure gives e,, = 6”=0.1, Since the angle is sufficiently small, it can be calculated 
with an accuracy | up to 0.1, By interpolation, we obtainu =14° 02,0’, 


(b) If log tan ua, then tan 110° and w= arctan 10°. Then, 


¢, =O, ta (3.26a) 
1+10°% 0.4343 
or 
sin 2u 
Ey —= 0.8686 Ea: ( 3.26b) 


The condition under which it is suitable to calculate the angle from 
the logarithm of its tangent is that 


sin 2u < 0.8686, 


that is, that 2u< 60° or 2u> 120°. From this it follows that it is 
suitable to determine angles between 0 and 30° and between 60° and 
90° from the logarithm of the tangent. In the interval between 30° 
and 60°, the error in the angle isgreater than that in the logarithm 
of the tangent being used, but this increase is not great: if the 
angle is close to 45°, its limiting error will be 1.15 times as great 
as the error in the-logarithm of the tangent. 

Since log cot u=—log tan uz, the limiting error of the angle is 
obtained from the logarithm of the cotangent in the same way as in 
the preceding case. 

Comparison of the errors in determining an angle from the 
logarithms of the trigonometric functions shows that it is most 
satisfactory to determine the angle from the logarithm of the 
tangent or the cotangent (since throughout the entire interval from 
0 to 90° the limiting error of the angle does not exceed the product 
of the limiting error of the given logarithm and the number 1.15) 
and that in two thirds of this interval (from 0 to 30° and from 60° 
to 90°) the limiting error of the angle is less than the limiting 
error of the logarithm of the tangent. Therefore, with logarithmic 
calculations also we try to choose formulas and schemes that will 
allow us to calculate the angles from the logarithms of the tangents 
or cotangents. 


13. THE ERROR INVOLVED IN FUNCTIONS OF SEVERAL 
VARIABLES 


To shorten the writing, we shall confine ourselves to seeking the 
error involved in the case of functions of two variables. The 


Error in a Function with Approximate Arguments 35 


formula that we shall derive is easily generalized to the case of an 
arbitrary number of variables. 

Suppose that U = f(x. y)is a continuously differentiable function 
defined on some set of values of the arguments x and y. Suppose that 
we replace the exact values of the arguments with their approximate 
values « and #6. Then, we shall obtain an approximate value of the 
function u— f(a, 6). Let us calculate the limiting absolute error of 
the approximate value of the function, assuming that we know the 
limiting absolute errors of the arguments «, and «,. 

We shall denote the exact errors in the arguments by A, and A,. 
From the definition of exact error (1.1), we have 


Az=a+aA,, B=6+A,, 


where A and 8B are the exact values of the arguments. The exact 
value of the function is 


U=f(a+ dg, b+ dy). 


Let us assume that the exact errors are small, so that we can 
neglect their squares and higher powers. Let us expand the right 
side of the above equation in powers of the exact errors and let us 
cut off the expansion at the terms containing the first powers of 
the errors. Then, we obtain 


.=U—u=(5) ; s4+(5.) » Ay. (3.27) 


In this equation, A, represents an approximate value of the exact 
error since only the first two terms in the expansion are kept. It 
follows from this equation that 


aU aU 
Sul<| ay cele! +] oy kc, 


y= b yuwdb 


| A, |. 


If we replace the absolute values of the exact errors with their 
limiting absolute errors, we shall obtain an. upper bound on the 
absolute value of the error in the function—a value that we can take 
as the limiting absolute error in the function: 


be 3.28 
y= yao ee) 

This formula is sometimes called the differential formula for 
the estimate of an error, since the right side is analogous to the 
expression for the total differential. For if we replace the dif- 
ferentials of the arguments in such an expression with the limiting 
absolute errors, and if we replace the partial derivatives of the 


36 Mathematical Analysis of Observations 


function with their absolute values, we obtain the formula just 
derived. It is used both for computing an estimate in specific 
numerical values and for analyzing the accuracy and clarifying the 
conditions under which the accuracy of the result can be improved 
(that is, under which the limiting absolute error of the function will 
be decreased). 

We note that the formula with which we began (that is, the 
formula containing the first two terms of the expansion) is some- 
times given for the problem in question. This is meaningful only 
in those cases in which the exact errors in the arguments are 
considered as given. Then, the formula gives an approximate value 
of the error of the function. However, if the exact errors in the 
arguments are unknown, as is usually the case, and only the 
limiting errors are known, we need to apply the formula derived in 
this section. 

Let us consider some examples of the application of this 
formula. 


Example 1, The simplest formula for determining the time at which a star will rise 
and set is of the form 


f =: arccos [(-tan ¢) tan 8] 


where » is the latitude of the point at which an observation is made, 8 is the declination 
of the star, and ¢ is the hour angle (defined by some convention), In using this formula, 
we assume that | (tan¢) tan 3{ <1, that is, that the star does actually rise and set, 

From the formula derived above, we obtain the following expression for the limiting 
absolute error of the hour angle; 


; | sin 3secyjee+ sin ¢ secd | es 
Gee lee 
V cos (¢ +8) cos (p — 8) 


The symmetry of the formula with regard to » and6 is a result of the symmetry in the 
formula from which the calculations are made, 

Let us calculate the limiting error for the calculation of the hour angle for the rise 
of Mars at Simferopol on February 3, 1948, 

From the table of latitudes of the more important cities in the Soviet Union contained 
in the book Kurs obshchei astronomii (Course in General Astronomy) by S, N, Blazhko, 
we find thatp = 44°57’, From the astronomical calendar for the year 1948, we find 
that 8 = 14° 20’, Since these numbers are taken from more exact tables with rounding 
off, we may take ¢, = e3 = 0,5’or Ee = &; = 0,00015 radians, (We recall that in these cal- 
culations, the limiting error of an angle must be given in radians, Therefore, the quantity 
0.5’ = 30” must be divided by the number of seconds in a radian, which is 206,265, As a 
simplification, we take 200,000, which is permissible since such a subsdtution does not 
decrease but increases the limiting error.) For simplicity, all the factors should be 
replaced with as simple ones as possible, but in such a way that the error is not de- 
creased, It is easy to see that in the present problem all the angles should be increased, 
Therefore, we takey =45°, 8s 15°, +8 =—60°, andy —8 =31°, By using these values 
for the angles and a 2- or 3-place table of trigonometric functions or a slide rule, we 
obtain 


ey = 0,37 €, + 0,74 3 = 0,00018 radians 


The hour angle is calculated from the formula with a limiting absolute error of about 
36” 

Let us now consider the question of the most satisfactory conditions of observation, 
that is, those under which the limiting error will be the smallest, Since neither the 
choice of point of observation nor the choice of the celestial body is arbitrary in this 
problem, the solution of the question can be, so to speak, only passive; that is, we can 
only ascertain which celestial bodies and which latitudes enable us to calculate the hour 
angle of the rising and setting of the body with the least and which with the greatest 
accuracy, Since the two arguments 9 and} are symmetric, we need only investigate the 


Error in a Function with Approximate Arguments 37 


effect of one of them on the result, It is easy to see that as ¢ increases from 0 to 90° 
with 8 held constant, the coefficients of €~ and es increase monotonically, We would 
obtain an analogous conclusion with 8 increasing monotonically and ¢ held constant, It 
follows from this that the hour angles of rising and setting of the body are determined 
with least error close to the equator of the earth and for bodies that are close to the 
celestial equator, 

Example 2, Determination of the astronomical latitude of the point of observation de~ 
pends on the basic formula 


cos z = sin» sin8 ++ cos ¢ cos 8 cos f, 


where z is the geocentric zenithal distance of the body in question, ¢ is the unknown 
latitude, and 8 is the declination of the star, Let us set 


{= t—a, tar7T + uy; 


where a is the right ascension of the body, 7 is the reading of a chronometer, and u is 
the correction of the chronometer, The declination and the right ascension can be ob- 
tained from an almanac at the instant of observation, Here, the geocentric values must 
be taken, Let us suppose that a and 8 do not contain errors or, more precisely, that the 
errors in the coordinates are considerably less than the errors in those quantities that 
are obtained from observations, 

Therefore, we shall assume that errors are contained only in the measured value of 
the zenithal distance and in the value of t, since it is impossible to separate the error 
in the reading of the chronometer from the error in the correction of the chronometer, 
Instead of expressing the latitude when it is determined explicitly in terms of the other 
quantities, we shall proceed in the following fashion, First we differentiate both sides of 
the formula, After replacing the differentials with the exact errors, we obtain 


— sinz 4, = cos » sin} A, — sin » cos 8 cos £4, — cosy cos 8 sing A,. 


For the spherical triangle with vertices at the zenith, the pole, and the star, we obtain 
the formulas 


cos » sin § — sin » cos 8 cos ¢ = — sin 2 cos A, 
cos &sin¢t = sinz sin A, 


where A is the azimuth, 


By using these formulas, we can write the expression for the exact error in the 
latitude in a sufficiently simple form: 


A,=secAAd,—cos¢ tan AA,. 


Replacing the errors with the limiting values of the errors and taking the absolute 
values of all the factors gives a formula for calculating the limiting absolute error in 
the latitude: 


é, =|secAle,-+ cos | tan Ale,. 


The method of application of this formula in particular cases is obvious, Therefore, we 
shall not consider a numerical example, From the formula, it is easy to derive conditions 
under which the limiting error in the latitude will be at a minimum, namely when the 
star rises at meridian (A= 0 or 180°), In this case, the coefficient of e, will have its 
minimum value, namely, unity and the coefficient of e, will vanish, 

Two practical deductions can be made from this: (1) Observations for determining the 
latitude should be made when the star is close to a meridian (since it is not easy to 
ensure observations exactly at a meridian), and (2) it is extremely important for the 
zenithal distance to be measured as exactly as possible since under all circumstances we 
have the obvious inequalitye, > e,,with equality holding only when the observations are on 
the meridian, 

Example 3, To determine the correction in the hours, we use the same formula as in 
determining the latitude, but here we assume the latitude known, . 

For calculating the correction in the hours, we must write the formula in the form 


cos z= sing sin3 + cos 9 cos 8 cos (7 —a-+ uw) 


(with the same notations as in the preceding example), 


38 Mathematical Analysis of Observations 


As in the preceding example, we shall disregard errors in the spherical equatorial 
coordinates, We make the substitution 


A, =4,+4-4, 
in the formula of the preceding section to calculate the error, Let us solve forA,, Then, 
A, =— Ap—sec¢cot A dg. 
Passage to the limiting absolute errors gives the formula 
Ey =Ep + secy|cotAle,-+secy! csc A| ez. 


There are no difficulties involved in calculating the limiting error from this formula, 
As is usual in such calculations, we may make crude approximations to the values of the 
angles A and ¢ in such a way that the limiting error will not decrease but increase, 

It is clear from the formula that we have obtained that the determination of the 
correction in the hours will be made with the minimum error if the heavenly body is on 
tne eet vertical (azimuth equal to 90° or 270°) The minimum value of ¢,, is equal to 
er €e sec ?. 

Example 4, The acceleration due to gravity is determined by means of a swinging 
pendulum from the formula 


l 
= 7% ° 
& =k" pa? 
here ?{ is the reduced length of the pendulum and P is the period of oscillation, Observa- 
tions yield the following values and their limiting errors: 


{= 650,02 cm, ¢;=0.01 cm, 
P = 0.7098 sec, tp = 0.0001 sec, 


Let us calculate the acceleration due to gravity g and its limiting error, We shall take 
3.1416 as the value of x; that is, we shall sete, =0,00005, 
If we apply the basic formula to the present case, we obtain 


onl m2 2x3 
&g = “pr &x T a Ft > px ep 


which can be conveniently written in the form 
eg = nP-* (2Pls, + nPsy+ Qnle p): 


For simplicity’s sake, let us take P=1, 1 => 50, = 4, and2x =7 in parentheses and let 
us take the factor in front of the parentheses equal to 10 (on the basis of the rough esti- 
mate made by taking x= 3,14 and 0,7 %= 0,343), Therefore, the product has a value of 
about 9, but we decreased the value somewhat so that it will be more accurate to take the 
value 10, Then, we obtain ¢g= 10°(100*5*10~° 4 49107? 47°107*) or &,= 0,8 cm/sec, 
The error is noticeably increased, but the result is obtained quite simply anyway and, 
in fact, the entire calculation can be done in one’s head, A more accurate calculation, 
explained in the book Matematicheskaya obrabotka rezul’tatov izmerenii (Mathematical 
Analysis of the Results of Measurements) by K, P, Yakovlev, from which this example 
is borrowed, gives an estimate of the error of 0,5 cm/sec*, When the calculations are 
performed, we obtain g =978,0, Considering the estimate of the error that we have 
obtained, we must take 


g=978 cm/sec’; ¢&,= 1 cm/sec%, 


Example 5, To calculate the orbit of a small planet from three observations, we need 
to calculate the average anomaly M when we know the eccentricity e and the eccentric 
anomaly £., This calculation is made on the basis of Kepler’s equation 


M=£—esin &, 
Let us take E = 43° 35’ 16” ande = 0, 14136, We need to calculate the average anomaly 


M and to estimate its limiting absolute error, Suppose that all the significant figures 
written in the expressions for E& and e are definitely known and that we know that these 


Error in a Function with Approximate Arguments 39 


numbers are obtained by rounding off, The limiting absolute error in the eccentricity is 
equal to 0.000005, The error in the eccentric anomaly is equal to 0.5% If we convert 
this value to radian measure (dividing by 200,000 instead of 206,265~see example 1), 
we obtain ¢-, =0,0000025, We then compute the limiting error of the average anomaly 
from the formula 


Earp = (1 —ecos E)e,+|sin EJ e;. 


In this formula, we do not need to write the absolute value of the derivative with respect 
to the eccentric anomaly, since the value appearing in parentheses has to be positive 
since e < land the absolute value of the cosine never exceeds unity, 

To calculate the limiting error, we take cos £ = 0,72 (estimating too low), e = 0,14 
(estimating too low) and sin£=—0,70 (estimating too high), Then, e cos E=0,10 and 
1-ecosE=0,.9, Thus, 


£4, = 0.9-2.5- 107° 4+ 0.7-5-1078. 


Increasing the limiting error only slightly, we obtain«,, = 0.000006 radians, Con- 
version to an estimate in seconds of arc by multiplying by 206,265 (we actually multiply 
by a slightly greater number) gives €y, =1,.3% 

Remark: To calculate M, we must either express the angle £ in radians or express 
the eccentricity in seconds or degrees of arc, This is explained by the fact that Kepler’s 
equation M = E—e sin £ is a formula derived using the tools of analysis and therefore 
the angles in it are naturally expressed in radians, To make possible the use of ordi-~ 
nary tables of trigonometric functions (in which the argument is expressed in degree 
measure), we need to multiply all the terms in the equation by the number of seconds 
(or degrees) in a radian, Then, the angles £ and M are expressed in seconds and in the 
term @€ sin E the conversion factor can be applied either to @ or to sin £, lt is customary 
to treat the eccentricity formally as an angle in radians, Therefore, it is converted 
into seconds of arc, When we express the eccentricity in seconds, muluplying by 206,265, 
we obtain the value e = 29,158” in our problem, lf we now multiply this number by the 
5—place value of the sine, 0.68947, we obtain (E — M)” = 20,104? Therefore, E— M = 5°35’ 
04”, Finally, we have M = 38° 00’ 12” and ¢,,=1,3’, Consequently, in the value of the 
average anomaly, we cannot vouch for whole seconds, An error in either direction of 
one second is possible (that is, it can be 11”, 12% or 13”), 

Example 6, Let us calculate the total surface area P of a right circular cone whose 
base has radius r=3,4 cm and whose generator has length /=7,6 cm, Let us find the 
limiting relative error of the result P, 

The total area is given by the formula 


Pao -L 2zxrl. 
The formula for calculating the limiting absolute error is of the form 
ep = (r?+ Arilje, +n (2r + 21) ey + 2nre}. 
Taking = =3, 14, we obtain for the limiting absolute errors of the arguments 
e,=05:1077, «=e; =0.5- 107". 


If the coefficients of the limiting errors in the arguments are calculated with the values 
given for them, 


ep = 64e, + 70e, +22, or ep=5cm?. 


Thus, in the value for the total area, we have certainty only up to one tenth of a square 
centimeter, Calculation gives P =200 cm? andep=5cm% In the expression for P, the 
digits 2 and O are certain; 6p =2,5%, 


14. THE CONCEPT OF THE INVERSE PROBLEM IN THE 
THEORY OF APPROXIMATE CALCULATIONS 


Suppose that U is a function of n independent variables x, z,.... w, 
whose approximate values can vary somewhat in accuracy. 


40 Mathematical Analysis of Observations 


Let us suppose that a certain required accuracy for a function 
is given in advance, that is, that the limiting absolute error of the 
quantity « is given. The problem is todetermine the limiting abso- 
lute errors in the arguments in such a way as to ensure this given 
accuracy in the function. We shall call this problem the inverse 
problem of approximate calculations. 

If the function has more than one argument, the solution obviously 
will not be determined since only one number(the error «¢,) is given 
and, to solve the problem, we need as many unknown limiting errors 
as there are arguments. In practical problems, a convention, 
called the method of equal influences,is often used. The principle 
of this convention consists in the following: in accordance with 
(3.28), let us write the expression for the limiting error of the 
function in terms of the limiting errors of the argument: 


ou 
e w= 155 ax = tle er . +)/55 nce 
y=b y=b 
=m w= w=" 


The method of equal influences consists in choosing errors of the 
arguments in such a way that all the terms on the right side of this 
equation will have the same value: 


=| legs a, (3.29) 


where n is the number of arguments. These equations give the 
expressions for the limiting errors in the arguments: 


ey 


oy}? 3.30 
n| 22] (3.30) 


&, = 


It may be that the limiting errors obtained in this manner are 
not all admissible from the conditions of the observations. In such 
cases, we obviously must make some sort of modification in the 
convention, but we should still take an accuracy as close as 
possible to the one obtained from the method of equal influences. 
A justification for this method cannot be easily demonstrated; it is 
only a convenient convention. 

A second variation on the method of equalinfluences is the con- 
vention of setting all the terms in the expression for the limiting 
relative error of the function equal to eachother. This leads to the 
equations 


6 |javU 
ce MHlEE Lt gay 


y=b y=d 


a 
u 


= 
“Ox 


Error in a Function with Approximate Arguments 4l 


(3.32) 


It is easy to see that the expressions for the limiting relative 
errors of the arguments are formally equivalent tothe expressions 
for the limiting absolute errors, since by means of elementary 
transformations, one can obtain an expression for 6, from the 
expression for «,, etc. However, it often happens in approximate 
calculations that from a practical standpoint there canbe a signifi- 
cant difference, since two formally equivalent formulas can (as a 
result of the fact that we necessarily use a finite number of digits) 
give two numbers that are different both in magnitude and in 
accuracy. Therefore, in more important problems, we should try 
to write both expressions and compare the suitability and accuracy 
of calculating with each. 


Example. The volume of a cone is computed from the familiar formula 


Vez 


wrih, 


where r is the radius of the base and & is the altitude of the cone, Suppose that the ap~ 
proximate values of these quantities are 3,2 cm and 4,7 cm, respectively, The dif- 
ferential formula will be of the form 
sey = the + Qnrhe, + mrrep, 

We are givene,, From the method of equal influences, we have 

r*he. = Ep, 2urhe, = tr, Tr36), = er. 
When we determine the limiting absolute errors of the arguments from these equations, 
we may decrease them; that is, we may increase the values of the partial derivatives, 
Therefore, for calculation, we have 


r= 3.0, rll, h=5, nh = 16, nr3 = 32, 


We then obtain 


If, for example,ey =1 cm3, we obtaine, =0,02,¢, = 0,01 cm, ande, =~0,03 cm, Here, 
we departed slightly from the general rule by taking values somewhat higher than they 
could be (0,02 instead of 1/55, etc,), The necessary accuracy in measurements can be 
obtained, 


Part Il 


POINT INTERPOLATION 


Chapter 4 
GENERAL REMARKS 


15. THE APPROXIMATION OF TABULATED 
FUNCTIONS. THE CONCEPT OF POINT 
INTERPOLATION 


Various functions are used in investigating natural phenomena 
mathematically, The functions may be defined in various ways, 
The simplest of these is a definition by means of an analytic 
expression, which makes it possible to determine the value of the 
function from any (admissible) value of theargumentor arguments, 
In practice, such cases occur only rarely, However, even when 
they do occur, the operations indicated in the definition of the 
function can be extremely tedious, Therefore, the use of a given 
analytic expression to compute directly the values of the function 
at arbitrary values of the argument can present difficulties. 

Frequently, functions are defined by means of infinite series, 
Calculation of the values of a function by means of an infinite 
series is a rather tedious operation, requiring investigation as to 
the convergence of the series and a determination of the number 
of terms that must bekept to ensure a specified degree of accuracy. 
Therefore, it is a tedious matter to use a series for calculating 
the values of a function at an arbitrary value of the argument. 

A function may also be determined by means of an indefinite 
integral or by a differential equation. Some physical problems 
give rise to indefinite integrals or solutions ofdifferential equations 
that can be expressed in closed form, but even then they can be 
cumbersome. As an example, let us consider the function f(x) 
defined by the equation 


fo=) waye ray 


As we know, such an integral can be expressed in closed form in 
terms of elementary functions, but such an expression will be so 
cumbersome that it is hardly advisable to calculate f(x) from the 
exact formula every time that we use it, And, in most cases, 


A5 


46 Mathematical Analysis of Observations 


functions defined by integrals or by differential equations cannot 
be expressed in closed form in terms of the elementary functions. 
The numerical values of such a function need to be determined by 
some approximate method for the various values of the argument. 

In all those cases in which it is either impossible to calculate 
the values of a function exactly or the calculation is too tedious, 
we resort to a table that has been compiled for the function (which 
will be the case if the function occurs fairly frequently in various 
contexts), 

We should mention one other method of defining a function, one 
that leads directly to a table of its values. This is the case in 
which we may assert, on the basis of general physical considera- 
tions, that a certain value represents a function of one or several 
arguments, * but the phenomenon is not sufficiently well studied to 
show the connection in a mathematical expression. In such cases, 
observations are made leading to a table of values of the function 
for various values of the argument, For the most part, sucha 
table can be obtained only for a rather limited number of values 
of the argument, Here, expansion of the table is either impossible 
or it presents great difficulties, 

In all the cases that we have listed and in others similar to 
‘them, we come, in the last analysis, to a tabular value of the 
function, The function «= f(t) is determined by a table of its 
values x, for given values of the argument ¢, (for k=1, 2,..., n). 
The tabular values of a function and its argument are called the 
basic points of the table. If we construct a graph of a tabular 
function of a single argument, the points on the graph are also 
called basic points. 

The values ¢, and x, are given with a definite number of sure 
digits, usually with a certain number of digits to the right of the 
decimal, If the table is computed by means of an exact analytic 
expression or a device allowing an estimate of the error (for 
example, by means of an infinite series), thesedigits in the tabular 
values of the function can, except for the last one, be considered 
as correct, The last digit cannot deviate by more than a unit from 
what it would be if the computations had been carried out with a 
greater number of digits. 

For example, consider the set of digits to the right of the 
decimal in the expression sin 22° 19’ =0,4104697. In rounding 
this figure off to five decimal places, the fifth digit that we obtain 


*Natural phenomena are in general quite complicated, the relationships between the 
different quantities are quite varied, and it rarely happens that we can consider some 
quantity exactly as a function of 1 or 2 or 3 arguments, When the quantity observed is 
considered to be precisely a function of only one observed argument, this almost always 
means that this one argument is the principal factor determining the value of the function 
and that the influence of other arguments on the value of the function can be neglected 
under the conditions of accuracy with which the observations are made, For example, 
the heliocentric coordinates of a small planet or comet can, in the course of a brief 
interval of time, be considered as functions of the osculating elements and time, and 
these coordinates may be calculated from the 2-body formulas since the disturbances 
are small and may be neglected, 


General Remarks 47 


differs by one from the corresponding digit inthe more exact value 
of the function, namely, sin 22° 19° = 0.41047, 

If a table is obtained for a function by means of a numerical 
solution to a system of differential equations (or to a single equa- 
tion), the situation becomes more complicated because there is as 
yet no sufficiently reliable and simple method of estimating the 
error of a numerical solution of a differential equation. An even 
greater complication arises when a tabular function is obtained 
from observations, In this case, the values of the function contain 
a number of errors of different origins and, therefore, the functional 
relationship must be sought. If observations give several sig- 
nificant figures, at least one and sometimes the last two or three 
are unreliable, Generally speaking, in this last case, we may not 
apply the convention of determining the limiting error from an 
expression for an approximate number (see Part I),* 

Thus, the values of a function are quite frequently given by 
tables; that is, a value of the function is given for each of a certain 
set of values of the argument. Usually, the table is arranged in 
Such a way that values of the argument are listed in increasing 
order, The values of the argument are given at certain intervals, 
known as steps. When feasible, tables are compiled with a con- 
stant step, but this is not always suitable, since it requires an 
extra amount of work without increasing the usefulness, if there 
are regions in which the function changes slowly or almost linearly. 
In such cases, the entire tabulated domain of values of the argument 
is broken into several parts and a constant step is chosen for each, 
Generally speaking, these steps cannot be small; otherwise, the 
Size of the table would be too great, 

In solving problems that occur in nature, we usually deal with 
cases in which we need to know the values of the function at other 
values of the argument than those listed in the table. For example, 
we often need to find the coordinates of the sun relative to the 
center of the earth, but it is usually not at O* universal time 
(which is given in the almanacs) but at quite different instants of 
time situated between those that are tabulated. Therefore, the 
following problem is of great practical significance: suppose that 
values of a function have been tabulated. We must find a method of 
determining approximately the values of the function for arbitrary 
values of the argument other than those listed in the table. 

If the value of the argument for which we wish to find the value 
of the function lies between values of the argument that appear in 
the table, the problem is one of interpolation, If the value of the 
argument in question is greater than or less than every value 
appearing in the table, it is a problem of extrapolation. 


*The convention that the iimiting error is jess than one half the unit of the last digit 
has to do only with errors in rounding off that are incurred in measurements and cal- 
culations under the condition that the units of the iast digit are determined from a scale 
and not by the eye, It is usual to speak in this fashion if the position of the needle between 
the divisions of the scale is determined by a visual estimate in terms of tenths of the 


interval between these divisions, 


48 Mathematical Analysis of Observations 


The simplest procedure for interpolating, which is still sometimes used, consists 
in tracing by hand a smooth curve through the points on a graph representing the basic 
points of the table, This curve is used as an approximate graph of the function and an 
interpolation is made from this graph in an obvious fashion, The accuracy of this method 
is quite limited, as is the accuracy of every graph, Also, the tracing of the curve is 
somewhat arbitrary and indefinite, Therefore, such a method is not applicable to tabu- 
lated functions all of whose digits are reliable, since a loss in accuracy would ensue; 
that is, the interpolated values would be less exact than the tabulated values, A graphical 
interpolation can be applied only in those cases in which the table is obtained from 
insufficiently reliable observations and the functional relationshipitselfis not sufficiently 
reliable, 


Even in Newton’s time (and earlier) the development ofa 
method of interpolation from a tabulated function had been reduced 
to an approximation of the tabulated function by means of another 
function that allows easier calculations, The approximate expres- 
sion of the given function that is constructed is used for inter- 
polation. The given value of the argument is substituted into it 
and the calculations are made. Usually, an approximate repre- 
sentation of a small part of the table in the neighborhood of a 
given value of the argument is constructed. 

The construction of an approximation of a tabulated function 
is an indefinite problem and is impossible without the preliminary 
introduction of two conventions or assumptions, 

In the first place, some agreement must be made with regard 
to the class of functions used for approximation. In practical 
problems, a natural requirement is that it be easy to calculate 
the value of the function from a given value of the argument. 
This condition is satisfied by algebraic polynomials, which are 
almost exclusively used for approximation, If a function possesses 
properties that algebraic polynomials do not satisfy, other func- 
tions are used, Trigonometric polynomials are used most often 
if a function is periodic and an approximation is necessary in a 
region containing the entire period. If a function increases more 
rapidly than a polynomial, it would be natural to use exponential 
functions, but in practice, this is rarely done, When we need to 
approximate a periodic function over only a small part of its 
period, algebraic polynomials are usually used, The chosen 
approximating function must contain some number of literal 
parameters, which must be determined from the given table, 
If an algebraic polynomial of degree n is used for approximation, 
the n+-1 coefficients in the polynomial are examples of such 
parameters, 

It is natural to demand that the approximation be as good as 
possible, but the meaning of the word ‘‘good’’ needs to be defined. 
This is the second requirement imposed on the problem. 

In practice, various criteria for the best approximation are used, 

In this part of the book (in the solution of the problem of 
approximating a function), we shall use the following criterion: 
the approximating polynomial must, exactly, fit the basic points 
of the table,* It follows from this criterion that the degree of the 


*Another criterion will be considered in Chapter 17, 


General Remarks 49 


polynomial used for interpolation must be one unit less than the 
number of basic points that are taken. This is because an zth- 
degree polynomial has 1 -+ 1 coefficients and to determine them, 
n +-1 statements are necessary. 

The problem of interpolation of a function of a single variable 
x= x(f) can be represented graphically as follows: suppose that 
points (4, %, for £—0,1, 2,...,. are given in the ¢x-plane. 
These are the basic points of the table. We must construct an 
nth-degree parabola that will pass exactly through these basic points. 
For this reason, interpolation under this requirement is called 
point interpolation, 


16. A THEOREM ON THE EXISTENCE OF AN 
INTERPOLATIONAL POLYNOMIAL 


Before deriving specific formulas for interpolational polynomials, 
we need to see what the Conditions are under which a polynomial 
satisfying the requirement of point interpolation exists, and 
Whether such a polynomial is unique or not. The answer to this 
problem is given by the following. 

THEOREM. [Jf all the tabulated values of the argument are 
different, an interpolational polynomial satisfying the condition of 
point interpolation exists and is unique, 

Proof: Suppose that the +! basic points (fo, 9), (t,, ©)... 
(*,. *,) of a function «=F (t) are given, and that 4, 4+, for s+k. 
An nth-degree interpolational polynomial 


P(t) =a + a,t+ at? 4+... +a,¢” (4,1) 
must, by the definition of point interpolation, satisfy the conditions 
P (to) = Xo, P(t,)=%X,, P (t)) = Xo, ses P (tn) = Xn- (4.2) 


From this, we obtain the following system of n-+-1 linear 
algebraic equations with unknown Coefficients a, (for s=—0, l, 2, 


coos TY)! 


2 
Ay + ayty + aut - ... +-anto = Xo, 


3 

a, +-a,t, -asti te ... 4 Gath = x, 
2 

ay, ++ ayto datz - ... + ants = Xp, (4.3) 
2 I no 

ag + at, + Aol p + o8 — Anly = Ny. 


The question of the existence of an interpolational polynomial 
reduces to the question of the vanishing of the determinant of this 
system: 

E 


50 Mathematical Analysis of Observations 


Vv — - (4.4) 


eo ¢ ee © 8 @ @ @ 


It is known from algebra that the value of this determinant, known 
as van der Monde’s determinant, is given by the formula 


W = [(tn to) (tn —ty) «++ tn —tn_v X 
XK {(fp-1 — to) (tn-1 — 1) 0 (tnt — tna X 
vee X [lg — bo) (fg — 1) CE — F)- 


By hypothesis, ¢, #¢, if ks; hence, 
W #0. (4.5) 


Since the determinant of the linear system is not 0, this system 
has a uniquely defined solution a, a,,.... a, Consequently, one 
and only one interpolational polynomial exists. 

We note that this theorem can be considered the only basis for 
a formal application of the method of point interpolation to an 
arbitrary function, since in the theorem ¢, and x, are arbitrary 
numbers. The question as to how satisfactory the approximation 
given by the polynomial is, is decided from the properties of the 
function by means of an estimate of the interpolkational error. 
Sometimes, to justify the method, we sight the possibility of 
approximating a function by means of a partial sum of a power 
series, or Weierstrass’ theorem on the possibility of approximating 
a continuous function by means of a polynomial with any desired 
degree of accuracy. However, neither of these has a direct bearing 
on the problem of point interpolation. No partial sum of a power 
Series can exactly represent a function at all basic points, since, 
if the coefficients of the polynomial are determined by the process 
used in point interpolation, they will not be equal to the corre- 
sponding coefficients of the series, 

To illustrate this assertion, let us suppose that a table of 
values of a periodic function is given and that the step in the table 
is equal to the period of the function, For example, suppose that 
a table for sin ¢ gives values for t=—0, 2n, 4x, ete. Clearly, the 
interpolational polynomial that we would construct has a constant 
value, namely, zero. We would obtain the initial term of the series, 
but the approximation cannot be considered satisfactory. Thus, the 
possibility of constructing a power series for this function does 
not have a direct bearing on the problem of interpolation. 

Weierstrass’ theorem asserts that for a given function x () 
and an arbitrary positive number ¢«, we may find a polynomial 
P(t) satisfying the condition 


|Ix@)—P,()|<e 


General Remarks 51 


everywhere on the interval a<t<b, but the conditions of this 
theorem and its proof do not assume that x (t,) =P, (t,) for any 
sequence of values of the argument ¢, (for k=1, 2,..., 2). 


17. LAGRANGE’S INTERPOLATIONAL POLYNOMIAL 


To find the interpolational polynomial, it is sufficient to solve in 
literal form the system of equations given in Section 14, which 
define the coefficients of the polynomial. If the system is solved 
by means of a determinant, every coefficient a, is determined 
by the formula 


W 
a, = “ ’ (4.6) 
where 
3 - 
1 t 6 ti x, et. 
it, # Blox, Ae 
La ee ’ 
Le ee (4.7) 
8—- L 
1 fn fn th Xn He fn 
(s=0, 1, 2 n) 


and W is defined by (4.4), 
If W, is expanded in terms of the elements of the column 


(Xo, X,,.-., x,), then W, will be a linear expression of the form 
We rudy (4.8) 


where the 4, are numbers depending on all the basic points of the 
values of ¢ and on the number s. Then, the coefficients of the 
polynomial take the form 


a, = DX nls (4.9) 
k= 0 
where 
o = Gh (4,10) 


If we use (4.9) to substitute all the a, into the polynomial (4.1) 
and if we correct the terms containing x,, we obtain 


P(t)= Py) Xyly(t), (4.11) 


52 Mathematical Analysis of Observations 


where the L,(f) are polynomials of degree n. Lagrange showed 
how one might obtain the values of L, (4) without solving the system 
of equations for the coefficient a), a,, a,,..., a, 

To satisfy the condition of point interpolation, the polynomials 
L,(f) must satisfy the conditions 


P (t,) = X;, (k— 0, l, ov es n), (4,12) 

Ly (x) = 1, 

Ly(t)=0, if R#S; (4,13) 
k=0, 1, 2,..., 4. 


The second of conditions (4.13) means that all the basic points of 
¢t except ¢, are roots of L,(f). Therefore, 


Ly (t) = Ay (t — to) (tty) «Ete — tga) oe tgs 


where A, iS an unknown coefficient. This coefficient is easily 
determined from the first of conditions (4.13): 


Ay, (Ly — bo) (te — fa) + + le fe) le et) 0 Ce fn) = I. 


Thus, finally, 


t—t, t—t t—ty-1 t—thay t—tp, 


L,() = 


th—tyo ty—ty °° ty— te-ateh—teer | fe tn’ 
We introduce the notation 
L(t)= le — t,). 


If we differentiate L(t) and substitute +—14, into the derivative, we 
Shall obtain the denominator of the last expression for L,(4), since 
the derivative will consist of products only one of which fails to 
contain t—t¢,. After the substitution, this term will give the 
denominator and the remaining products will vanish. Therefore, 


_ L(t) 
L.O=G—DI (4,14) 


Thus, Lagrange’s interpolational polynomial, containing directly 
the basic points (t,, «;), is of the form 


L(t) 
“kh EX th) L' (i) 


P(t)= ») (4,15) 


or, in expanded form, 


Sg, HOt ty oe te) te) Et) 
> Xk CF, — fo) (th —f).. (ty — tres) (te— tres)... (th — tn)’ (4,16) 


General Remarks 53 


Example I, The function defined by the table 


x{ QO 1 4. 
Let us give an approximation of this function for the Intervals 0 <¢< 1 andO <t< 2 


(a) On the interval 0 <¢< 1, we use the end points of this interval. 
From Lagrange’s formula, we obtain 


_t—1 t{— 0 
(b) On the interval 0 <t< 2, we have 
t—2 f—] . 


Consequently, 


P(t)=4, if 0<t<l, 
P(t) = 3t—2 is 1<t<2. 


The table was constructed for the function x = ?/, Therefore, we may compare 
the piecewise-linear approximation that we have made with the exact expression x == /?, 
On the interval [0, 1], the difference between the exact expression and P,(t) 1s f= Oot 


It is easy to see that has a minimum of —0.25 at ¢=0.5, This means that 


f 
| 
16 | < 0.25, 

On the interval [1, 2], we have &, = ??—3f+-2, €, has a minimum of —Q,25 at 


f=15. We see that the approximation of the given function by a piecewlse-linear 
function with two links ensures a limiting error of 0,25 throughout the interval of the 
table, namely, [0, 2]. 

Example 2, Construct an approximation of the function x = sin¢ by a linear function 


on the interval 0<lt< = from the basic points at the end points of the interval: 


0 +, 
x: O v2 
From Lagrange’s formula, we obtain 
, = _ 
P(t)= = Oe ae eee t 
4 4 


or 


P(t) = 0.900311. 


v1 
lf we use this approximation to Interpolate the value for ¢ == —- we obtain p(=)= 


Y 2 
za > 0.471 instead of the exact value 0.0. 


54 Mathematical Analysis of Observations 


Here, we can also investigate the accuracy of the approximation in the general form, 
The difference between the function and the polynomial is 


= x—P()=sint— 20? y, 


If we set the derivative equal to 0, we see that € has a maximum of 0.43525 — 0.40542= 
9,02983 at ¢ = 25° 48’ 05”= 0.45031 radians, We may take 0,03 as the limiting error of 


the approximation for the entire interval [0, = }. 


Example 3, Construct an approximation of the function «= sinf from the following 


r14 


three basic points on the interval [0, z }: 
Tt v19 
ft) 0 ET 
9 
x| O 05 Y? . 


From Lagrange’s formula, 


Let us now consider this polynomial with argument + where ¢ is defined by the equa- 
tion’ = zt (< will be a value of an angle if as a unit we take the angle 5.) After 


some simplification, the polynomial takes the form 
_ (9 3 € V2 9 , 
P(s)=(F—V2) t+ 3 7a) 


We can see by direct verification that P(t) passes exactly through all the basic points, 


TT 


lf we use P(t) to interpolate the value for ¢ = 5? 


corresponding to t = 1/,, we obtain 


I V2 
P(=) =05—¥* 0,264. 


The value of sin 15% correct to three decimal places, is 0.259, so that the error in the 
interpolation is equal to —0.005. 


18. ESTIMATE OF THE ERROR IN 
POINT INTERPOLATION 


If the given function is not a polynomial, the interpolational 
polynomial will give values coinciding with the values of the function 
only at the basic points and, possibly, at certain other isolated points, * 


*We note that this will be the case if the given function is a polynomial of higher 
degree than the interpolational polynomial, 


General Remarks 55 


In interpolating for given intermediary values of the argument, 
the polynomial only approximately represents the function. It is 
quite essential to determine the limiting error of the point inter- 
polation, that is, the least upper bound of the absolute value of the 
error. This can be done in the following manner: 

Suppose that an interpolational polynomial P,(t) of nth degree 
is constructed for a function x(t) with basic points (t,. x,) for 
k=0, 1, 2,..., 2, Let us construct an auxiliary function 


F(z) = x (z) —P,, (z) — RL (2), 


where /(z)=(z—t)(z—t,) ... (—t,) and k is an as yet undeter- 
mined number, From the condition of point interpolation, 


x(t,)=P,rlt)  (k=0, 1, 2,..., 0), 
it follows by definition that 
L(t) = 0. 


Therefore, F(z) has n+ 1 roots ¢, ¢t,,..., t,. Since k is at our 
disposal, we arrange for the function F(z) to have one root ¢ +#,; 
that is, we impose the condition 


x(t) —P,, (t) —RL(t) =0. 


Note that we cannot immediately solve for k from this equation 
because x(t) is unknown, Let us now apply Rolle’s theorem to the 
function F(z), According to that theorem, if the function f(z) 
vanishes at two consecutive values of the argument ¢, and ¢,,, and 
if the conditions of continuity and differentiability are satisfied, 


the derivative oh will vanish at at least one value of z between 


t, and ¢,,,- Our function F(z) has n+ 1 consecutive roots ¢, and 
an additional root +, which upon interpolation appears within one 
of the intervals between adjacent roots; that is, one of the intervals 
is divided into two parts. In making the Interpolation, we obtain a 
total of n+ 1 intervals at whose end points F(z) vanishes, so that 
F(z) has n+-2 roots, 

Suppose that, instead of an interpolation, we are making an 
extrapolation. The function must then be defined outside the tabu- 
lated region of the argument and ¢ will be either less than ¢, or 
greater than ¢,; that is, in addition to the n intervals between the 
consecutive basic points, there will be yet another interval between 
¢t and ¢, or between ¢, and ¢. If we apply Rolle’s theorem to each 


of the n+ 1 intervals, we see that the derivative ral vanishes at 


least n-+1 times, The points at which it vanishes will be within 
different intervals, that is, they cannot coincide. Let us suppose 
that x(t) has derivatives up to the (n+ 1)st order throughout the 
entire tabulated region (and in the extended region if an extrapolation 


56 Mathematical Analysis of Observations 


is being done). We now apply Rolle’s theorem to the function 
a in the intervals between the zeros. There are now n of these 


intervals. Thus, we obtain » values at which pas vanishes, If 


we Continue this operation with the derivatives of successive 
orders, we conclude that the (n-+-1)st derivative of the function 
F(z) vanishes for at least one value of z. We denote this value 
by t. (In the case of interpolation, + lies within the tabulated 
region; in the case of extrapolation, it lies in the extended region.) 
It follows from the definition of F (z) that 


GF (z) _ d™t!x(z) d" +P, (2) q™tiL (2) 


d2t! d2+! d2nt! d2™t! ° 


The second term on the right side of this equation is equal to o 
since the polynomial P,(z) of nth degree is differentiated n+ 1 
times, The polynomial L(z) of (1n-++1)st degree, when differentiated 
n+i1times, gives (n+ 1)! Since the coefficient of the highest 
power to appear in Liz) is 1. Therefore, 


d"+1 F(z) _ d™tix (2) 


dzgnt! dzntt 


—k(n+1)! 
If we set z=—t, the left side will vanish and we obtain 


~~ Vazgntt J (ayy 


By hypothesis, z= is a root of F(z) This means that if we 
substitute z=: in the function F(z), we obtain 


L(t) gutiy 
0= OPO — Teer Gat) 


From this it is clear that the error of the interpolation x—P, is 
given by the formula 


x'T@YD (¢) 


x(t) —P, ()=L() (n1y’ 


(4.17) 


where «+ denotes the (n+ 1)st derivative and 


ty <t<t,, in the case of interpolation, 
P<TtS | in the case of extrapolation, 
lgxct<t 

These considerations do not enable us to determine t. There- 
fore, in practice we can find thelimiting error only if it is possible 
to find an upper bound for the (n+ 1)st derivative of the given 
function, Let us denote by M,,, the maximum of the absolute 


General Remarks 57 


value of the (n-+1)st derivative in the region of the table or in the 
extended region. 
Then, for every given value of 4, we obtain 


Loo . Mr 
jx(t)—P, |< (a4 Ty | Ld]. 


if we wish, we may set a uniform upper bound to the error of 
interpolation, that is, a bound not dependent on ¢. To do this, we 
need to find a number JN such that IL@)|< N. Then, 


NMns 
Ix(N—P()|< Tae 


for all values of ¢ in the tabulated or extended region, 


Example 1, For x = Y f, construct the interpolational polynomial from the two 
basic points 


to = 0,25 xo = 0.5 


ty = 1, xy = l. 
By using Lagrange’s formula, we obtain the linear interpol ational formula 


| 


1 t— 0.25 2 


——_ . 0.54 5 


t 
OO = OF T0235" 


To get an idea of the nature of the approximation, we note that 


P(0.49) = 1s = 0.66, instead of 0.70 (error = +0.04); 
2.62 


P(0.81) = 37 0.87 instead of 0,90 (error = +0,03) 


Let us find an estimate for the error in interpolation for all values of ¢ in this 
region, From the general formula, we have 


x (t) — Py (t) = (¢ — 0.25) (t — p82, 


> 


(joe x” ({)=— 


Vi 4Vo 


where the primes denote the first and second derivatives, The quantity |x” (t)| de- 
creases monotonically in this region, and, therefore, 


l 


x(t <= 2, 
Ol Seam 


so that 
lve —P(t)| <}(t—0.25)(¢ — 11. 


When ¢ = 0,49, the right side is equal to ().]2, When ¢ = 0.81, the right side is equal to 
0.11, Both these values are considerably greater than the @ctual error, The polynomial 
determining the estimate of the error is of the form 


L(t) = (¢ — 0.25) (f — 1) = £2 — 1.25¢ +- 0,25. 


38 Mathematical Analysis of Observations 


Lit) has an extremum at ¢ = 0,625, Let us set ¢= 0.625 in Z(t) so as to obtain the 
maximum of the absolute value of L (f): 


| £L(t)| < 0.375? = 0.14. 
From this, we obtain an estimate of the error applicable to all values of ¢: 
Vt —Py(t)| < 0.14. 


Example 2, For the function x = sin ¢, let us construct the interpolational polynomial 
from the three basic points 


7 Tv 
flo = > 
x{005 1° 


From Lagrange’s formula, we obtain 


(ala) ,, «l-9) 
m= $4 eg 0 
(N03) RD 

6 2 6 6 2 
To simplify our work, let us set f= 2a, Then, 

9 7 
P(t) = —-5 a (2a — 1) + a (6a — 1) = 5 a — 32%, 


Our control is 


t | a | P (a) sin ¢ 
— | 

0 U 4) VU 

T l | 1 

6 6 2 y) 

T l 

a cy l l 


As a check on the quality of the approximation, we consider the special cases: 


t | a | P (a) sin ¢ | Error 
| 4 Tl 

¥19 . 

= | 1 | 5 983! 087 | +0,04 

3 3 6 


Let us estimate the error in the interpolation. Here, 
TT uv 
L(t =e (1 —)(: —3). 
| x” (t)| =| cos t| <1, 


Isint — P(VI<|#(#-F) (#5) 2 = 3-1 (6a—1) (221), 


Let us compare the errors obtalned from the estimate made by use of this formula with 
the actual errors in the particular cases given above: 


General Remarks 59 


Esdimate of 
| be | the error 
T m8 
Tt 13 
— |—~- ~ 0.30 
3 | Tos 0.05 


In both cases, the error obtained by using the estimate with the exact polynomial 
L(t) is only siightly greater than the exact error, 
Let us find a uniform estimate for ali values of ¢, To do this, iet us investigate the 


behavior of L(¢) throughout the entire interval from 0 to _ In the present case, 


x3 
L(t) = Fy 2 (2a-— 1) (62 — 1). 
Since ¢= na, 2 has values from 0 to!/,., Let us find the maximum of the absoiute value 
of the poiynomial 
M (2) = 1223 — 822 4-5 


in this interval, We find the derivatives 


dM aM 
da == 3623 — 162 + 1; “da? 


= 72a — 16, 


When we set the first derivative equal to0, 
3622 — 162 + 1=0, 


we obtain the values of z at which M(z) has its maximum and minimum: 


The plus sign gives the minimum and the minus sign gives the maximum, By a simple 
tansformation, we can reduce the expressions for the extrema of .Vi(z) to the form 


2 


M (a,,0) = 97 V4, 9— |). 


To do this, we need to eiiminate 29 and ai by using the equation defining 4, , 


When we substitute the extreme values 2; anda,, we obtain 


V 343 — 10 . V 343 +10 _ 
M (a4) = Ga = 0,0351; M (29) = 943. = — 0.117. 


The absolute value of the iargest value of (2) throughout the entire regionis 0.117, 


x3 
The corresponding iimiting value of the difference |sin¢ — P(¢)| is equal to a 0.117 = 


. fa 
0.0304, so that |sin¢ — P(t) |< 0.0504, This value is attained at a =~)! — 937 or 


t = 0,37" = 67°," 18 


*in the exampies that we have been considering, the construction of the Lagrange 
poiynomial caused no difficulties since only a few basic points were given and the values 
of the function and its arguments had few digits. 


Chapter 9 
INTERPOLATION FROM A TABLE WITH 
A VARIABLE STEP 


19. DIFFERENCE QUOTIENTS OF TABULATED 
FUNCTIONS 


If a table of values of a function does not have a constant step 
(that is, if the intervals between adjacent values given for the 
argument are different in different parts of the table), the differ- 
ences between adjacent values of the function cannot be used to 
denote the change in the function, Forthis,we use quantities known 
as difference quotients. 

Suppose that we have a table of a function x(‘) as follows: 


t}' ty f, ty wee ly, 


x Xo xy Xx» ees Xn 


The first or first-order difference quotient of two tabulated values 
of a function is the ratio of the difference between the values of 
the function to the corresponding difference between the values 
of the argument, This definition is applicable to any two values 
of the argument, but we will ordinarily be concerned only with 
adjacent values, The first difference quotients are indicated by 
placing the tabulated values of the argument in parentheses, 
Thus, we denote the difference quotients for the table shown 
above as follows: 


. _— AT 4X0 . — 2271 
x (f,. ty) = t- ty / oN (f5, j= fa — ty , 
— An * 
X (f4, fo) = — etc 
The difference quotient x(t,, ¢,)= mens can be written in a sym- 
| a | 
metric form: 
x Ny 
X(t. fy) = — + — 


Interpolation From a Table with a Variable Step 61 


It is clear from this that the order in which the tabulated values 
are taken is immaterial in computing difference quotients; that is, 


Xx (tf, fs) = (f., f 


The second difference quotient of three tabulated values is the 
ratio of the difference between two first-order difference quotients 
(that is, the difference quotient of the third and second of these 
values and that of the second and first) to the difference between 
the third and first values of the argument. Formally, the definition 
is applicable to any three tabulated values, but in practice, we 
usually take three adjacent values. Using notation analogous to 
that used for first-order difference quotients, let us write some 
second-order difference quotients: 


x (to, ty) — x (ty fo) 
to — to 

X (tg, fo) — x (ta, 41) 
fg— ty 


x (f4, t3)— x (fe, bo 
X (ty, ta, L)= eee ence 


X (ty, ty, to) = 


X (tq, ty, ty) = 


If we use the symmetric expression for the first-order dif- 
ference quotients, we obtain, for example, - 


X3 Xs 
(t3 — to) (t3 — ¢1) + (to — ts) (tg — ty) 
*2 ess Oe 
(to — 1) (tg — ¢,) (tf; — to) (tg —t1) ° 


x (to, to, t,) — 


After some simple manipulations, we obtain 


x 
X(t for f= Gon Go “(hot + CECA Th) Ga hy 


From this it is clear that second-order difference quotients are 
symmetric with respect to the tabulated values uSed; that is, 
changing the order in which they are taken does not change their 
values, 

Difference quotients of arbitrary order are defined analogously. 
For example, consider the third-order quotients 


X (ts, to, fy) — x (fo th fo) 
fz — fo 


xX (by fs, ty) — x (ty fo, A) 


xX (ty, to, ty ty) = 


etc, 
The independence of the order in which the values are taken 


is a common property of difference quotients of all orders. 


62 Mathematical Analysis of Observations 


If n+1 values of a tabulated function are given, we can con- 
struct n first-order difference quotients from that table, n—1 
second-order difference quotients, ..., and only one nth-order 
difference quotient. A value in the table itself is sometimes called 
a difference quotient of order 0. 


Example, Let us construct a table of the difference quotients of the function + = sin?¢ 


in the interval from 0 to + for the values 


t{=0, 6’ A? 3° 9° 


A complete table of the calculated difference quotients and the operations used in ob- 
taining them is shown on page 63, 


The column marked x'") shows the differences of consecutive values of the function 
and the column marked ?¢) shows the differences between the corresponding values of 
the argument, Division of the numbers in the x) column by the adjacent number in 


the ¢‘!) column gives the first-order difference quotient, Their complete notations 
would be 


X(t, to), © (fa tr)... tC. 


The column marked x) contains the differences between successive first-order dif- 
ference quotients and the column marked t’?) contains the differences lao — ty, ty — ty, 
and ¢,—¢s,. Division of the numbers in the x‘) column by the adjacent number in the 


t’) column gives the corresponding second-order difference quotient, The procedure 
for the remaining calculations is analogous, Since the numbers x, are given with 
three digits, we take three digits in the third-order difference quotients, This coin- 
cides with the number of digits to the right of the decimal, In the second third-order 
difference quotient, this gives four digits to the right of the decimal, Therefore, the 
first third-order difference quotient is also taken with four digits to the right of the 
decimal, which is not completely legitimate since this figure is obtained by dividing 
a three-digit dividend by a four-digit divisor, 


20. THE CONSTRUCTION OF INTERPOLATIONAL 
FORMULAS REGARDING DIFFERENCES 


Calculations by means of Lagrange’s formula are rather tedious, 
at least for manual computations. If we have n--1 basic points, 
the formula contains .+-1 terms, In each of these terms there 
are n factors in the numerator and just as many in the denomi- 
nator; besides, there is the tabular value of the function. In all, 
each term represents 2n calculations, namely 2(n—1)+-1 multi- 
plications and one division, Considering multiplication and division 
as equivalent operations and addition and subtraction as equivalent 
operations, we have the following total number of operations: 


Additions: 2n(n+-1)t-n == 2n? + 3n, 
Multiplications: 2n (n+ 1) = 2n? + 2a. 


G9EO'OTF | TLS*T/ELSO'Ot 


quapond 
aousr9yJIG ay 


Japsio-IMoj 


6980°0- 


oPPT 0- 


qapond 
dUaIaJIG 


JapiO~PITLL 


L¥0°T|T60°0- 


Lv0T|TST°O- 


(¢)7 


qusponh 
sud IajJIG 
JapIQ-puosas 


982°0| TSE°0- 


€ZS 0] 98T°O- 


G82°0| T9T'O- 


qenonh 
suas Iq 
Jspip-1s1tJ 


000°T}/ TLS ‘T = 


998°0|L70°T = 


L0L°0}/S821°0 = 


00S °0| 72S '0 = 


G 


= 


t]o9 


& ix 


2 
Mm 


64 Mathematical Analysis of Observations 


If n—4, we have forty-four additions and forty multiplications, It 
is thus clear that a large number of additions can lead to too 
many errors, Furthermore, the organization of the calculations 
in accordance with Lagrange’s formula is rather tedious. 

Because of these defects, we turn now to an exposition of the 
method of difference interpolational formulas, 

Suppose that we have a table for the function x = f(t) 
t] to t, ty. ty 


xX 


| Xo xy Xo srs Xn 


and that we wish to derive an expression for the interpolational 
polynomial of nth degree (or of lower degree for certain special 
choices of the numbers xp, *,,..-. Xn)- 

The condition of exact interpolation gives 1+ 1 basjc equations 


Xp =P (ty), *, =P), «1... X, =P) 


for determining the coefficients a, a,,..., a, of the interpolational 
polynomial 


P(t)=ajytat+ ... +a,t" 


It is clear from the basic equations that the values of the 
corresponding difference quotients of the tabulated function and 
the interpolational polynomial must be the same, When we equate 
these values, we obtain, for the coefficients of the interpolational 
polynomial, » more equations containing first-order difference 
quotients, n—1 equations containing second-order difference quo- 
tients, etc., and finally one equation with annth-order difference 
~ quotient. 

The total number of equations of the system of equations that 
is expanded in this manner will be equal to the sum of natural 


numbers from 1 to » +1, that is, to (op ars), 


In the derivation of Lagrange’s formula (see Section 17) for 
determining the coefficients a,, a,,.... 4,, weused n-- | basic equa- 
tions. However, these coefficients can be determined by choosing 
an arbitrary set of »+-i independent equations from the enlarged 
system above, Ordinarily, we take one of the basic equations and 
one equation each from the columns containing difference quotients 
of the same order, 

The coefficients determined from this system will be expressed 
in terms of difference quotients from the zeroth to the nth order. 
Substituting these coefficients into the polynomial P (4), we obtain 
a formula that we will call the formulafor difference interpolation. 

Let us consider the question of the choice of degree of the 
interpolational polynomial. To begin with, let us consider the 
case in which the tabular function is itself a polynomial P(t) of 
degree n. Let us construct its difference quotients with a variable 


Interpolation From a Table With a Variable Step 65 


step, Let us call some particular value of the argument ¢,. Then, 
for an arbitrary value of ¢ different from ft, we have 


x(t, fy) = LO= A) 
0 

By a theorem of Bézout, the division of a polynomial P(t) by 
the difference ¢—+#, gives a remainder of P(t,.). Therefore, the 
expression /?(f)-—P(t)) is divided by t—t, without remainder and 
the result is a polynomial of degree n-—1, It then follows that a 
first-order difference quotient of an nth degree polynomial is a 
polynomial of degree n—1, Analogously, we may show that a 
second-order difference quotient is a polynomial of degree n —2, 
etc., and an nth-order difference quotient is a polynomial of 
zeroth degree; that is, it is a constant. It is easy to show that this 
constant is equal to the coefficient of the last term of the poly- 
nomial P(t). The difference quotients of order greater than n are 
equal to o. 

This property of a polynomial is of great significance in the 
problem of interpolation. Only in unusual cases can the given 
tabulated function be an exact polynomial and, when this happens, 
we see after compiling the table of difference quotients that the 
difference quotients of some order are constants. In the general 
case, the given function is not a polynomial and the difference 
quotients are not quite constant. However, in a great many 
applied problems, the tabulated values of the function are such that 
they can be approximated by a polynomial of some degree. In the 
table of difference quotients of such functions, the differences of 
some order will be very nearly constant and the differences of 
the next higher order will be quite small. In this case, we may 
confidently use point interpolation and construct a polynomial 
whose degree is equal to the order of the almost-constant dif- 
ferences, Therefore, the compilation of a table of difference 
quotients is the only method of finding out what degree is necessary 
for the interpolational polynomial, that is, of finding out the number 
of basic points that are necessary fora satisfactory approximation, 

Recently, there has been a tendency to prefer the use of 
Lagrange’s formula in connection with machine calculation, but 
to solve the question of the necessary number of basic points we 
still must compile a table of difference quotients. 


21. NEWTON'S INTERPOLATIONAL FORMULA FOR 
A TABLE WITH VARIABLE STEP 


Newton and Gregory had already introduced a formula for differ- 
ence interpolation that is different from Lagrange’s formula 
(which was introduced later and whichcontains difference quotients), 
In this section, we shall give Newton’s formula, confining our- 
selves in the derivation to the special case in which the number 
of basic points is equal to 4, that is, to the case in which n=3. 

F 


66 Mathematical Analysis of Observations 


For this case, the enlarged system of equations can be reduced 
to the form 


t |x 


fo} Xo = 4+ 4,6)4+ ate + agte 
X41 (ty) fo) = 4 + g(t, + f0) + 23 (fi-+ ty for fo) 

ty | xy = ay yt, + agtt + ast X (fa, fy, £9) = Ay + Ag (ly t+ fy + fo) 

X (toy fy) = 4 + Og (fo + ty) + .49( BA Let +A) * (lg bo ty fo) = 2s 

ty| Xy = Gy + ayt, + ants + ass X (tg, by £1) = Ag + 43 (fg + lg + 4) 

X (ty, to) = Ay + Ay (fg +45) + 4g (t3+-t3t,+ 4) 

ta! Xq—= Ay + a,tg + agty + agts 
The first-order difference quotients of the polynomial, which 
appear on the right, are obtained after cancellationof ¢, —1f,, t,—1t,, 
t,—?,, All the second-order difference quotients of the polynomial 
are obtained in the form shown after the cancellation of +, — 1), 
t,t, The third-order difference quotient is obtained after 
cancelling by ¢, —1t). 

From these ten equations, we may choose any four. Let us 
choose the equations written along theupper sideof the hypothetical 
isosceles triangle that we would get if we replaced each equation 
with a point and connected the points with straight lines. These 


are the equations underlined in the design above, From them, we 
obtain the values of the coefficients: 


Q3=X (ty, fo, £4, fo), 

Ay = H (by, by, fo) — (to + 4, +f) & (tq, fo, ty, £0) 

A, = X (ty, Lo) — (fy + Lp) © (tos fy, £0) 4 CLofo H+ bilo + fot) © (fg, for Shs fo), 
Ag == Xq — foX (fy, fo) + by lox (fo, b1, £0) — lot fox (tg, fs, f), to). 


Substitution of the coefficients into the desired polynomial yields 


P(t) == Xq— lox (ty, fo) + bytoX (fos f4s £0) — 
— fot tox (fg, te, ty, fo) + fx (Ly. be) — 
— bby + by) X (to, bys fo) + FE lo) (te by bo) © (to bes fy. fo) To 
+ #PX (to, bys £9) — fF (to +b +t) X (ba bos bys fo) + 
+- fx (f5, for ty £9). 
We expand the polynomial obtained not in powers of ¢, but in dif- 
ference quotients of successive orders. Here, weconsider the given 
value as a difference quotient of zeroth order. The coefficients of 
these difference quotients are polynomials whose degrees are equal 
to the orders of the difference quotients. These coefficients are 
easily factored by grouping them properly. The interpolational poly- 
nomial then takes the form 
P(t) = Xo + (tf — bo) * hy, fo) +E —b1) (E — fo) X (bes fan Lg) + 


+ (t — te) (6 — £1) E — tq) © (gs foe bys fo). 


Interpolation From a Table With a Variable Step 67 


The polynomial is constructed in an analogous manner for an 
arbitrary number of basic points. 

For calculation, it is convenient to put P(‘) in the following 
form: 


P(t) = xp + (tf — ty) {x (t,, to + (§ —4,) [x (4, fi, fo) + 
+ (t — t,) x (ts, fo, t, to)}}. 


The calculations are performed ‘‘from the end’’: the third-order 
difference quotient is multiplied by ft—t,; the second-order dif- 
ference quotient is added to the resulting product; this sum is 
multiplied by +—14,; the first-order difference quotient is added 
to this product; this sum is multiplied by t—Js,; and finally, the 
zeroth-order difference quotient is added to the product. 

The formula and the method of calculation can easily be 
generalized for an arbitrary number of basic points, In working 
with a calculating machine, we hardly need to write anything 
besides the numbers ¢—1,,¢—4,... in compiling the table of 
difference quotients. 


Example, Suppose that we have the following table of values of the sine: 


° FF FF F F 
0.000 0.500 0.707 0.866 1.000 


t 


sin ¢ 


We wish to construct an interpolational formula and compute from it the value of the 
sine for 


t == 36° = = = 0,628. 


We compile the table of difference quotients 


Ist 2nd 3rd 4th t—1 
x=sin! | difference difference difference difference 1 
0.000} 0.000 0,628 
— 0.954 
0.524} 0,509 — —0.205 0.104 
0.793 —(),1442 
0.785; 0.707 —(),356 +- 0,0365 —0.157 
0.607 —(),0869 
1,047! 0.866 —0,447 —().419 


The difference quotients used in the formula are underlined in the table, The last 
column contains the numbers ¢ — fy, ¢—¢t,, ¢—t), and ¢—4t#,, For arbitrary values of ¢, 
the formula is of the form 


P(t) =¢ (0.954 + (¢ — 0.524) [(— 0,205) + 
+. (t — 0.785) (— 0.1442 + (t — 1.047) (4- 0.0365))]}}. 


68 Mathematical Analysis of Observations 


For a given value of ¢, the following operations are carried out on a calculating machine; 

(1) the number 0,0365 is multiplied by -0,419, To this product is added ~0, 1442000, 
The result is rounded off on the machine, We therefore obtain -0, 1595, 

(2) This number is transferred to the calculating machine, It is multiplied by 
~0,157 and to the product is added -0,2050000, After rounding off, we obtain -0, 1800, 

(3) This number is in turn fed into the machine and multiplied by 0,104, To the 
resulting product, we add 0,9540000, The sum is rounded off, yielding 0.9353, 

(4) The number 0,9353 is fed into the machine and multiplied by 0,628, We do not 
need to add anything because the initial value is equal to 0, The resulting number is 
0.5874, The fourth significant figure is a reserve figure, just as in the intervening 
calculations, From the table, we obtain 0.5878, Consequently, we know the first three 
digits for certain, 

By means of the formulas of Section 19, Chapter 4, we can make an estimate of the 
interpolational error, In the present case, the absolute value of the derivative of any 
order does not exceed unity, Therefore, 


[L()| 
i 


|sinf— P(f)| < 


where 
L (t) = 0,628 - 0.104 - (— 0.157) - (— 0.419) (— 0.943). 


We do not need to calculate L(t), We may somewhat increase all the factors in order 
to make the estimate of the error simpler: 


| L(t)| < 0,7-0.1-0.2-0.5-+ 1.0 = 0.097, 


Thus, 


lsint—P(f)| < me =~ 0.00006, 


This error is considerably less than the actual error, The explanation is quite 
simple, The formula for determining the error does not contain values of the function 
and, consequently, does not allow for the exactness of the given values of the function, 
The formula only allows for the error inherent in the interpolation with the assumption 
that the given values of the function are exact, 

In the present case, certain values of the function are given with three definitely 
known digits and therefore we cannot obtain a more precise interpolational result, The 
estimate that we have obtained shows that, for a given function, we cannot obtain an 
interpolation result more accurate than up to 0.5 - 1074, that is, with four digits to the 
right of the decimal, To ensure such accuracy, we would need to have the function 
tabulated with four digits to the right of the decimal, 


Chapter 6 


INTERPOLATION FROM A TABLE 
WITH A CONSTANT 


STEP 


22. ORDINARY AND CENTRAL DIFFERENCES 
OF A TABULATED FUNCTION WITH 
CONSTANT STEP 


Let us suppose that a table has been compiled for a function and 
that at least in the part of the table in which we are interested 
the step of the table is constant: 


J to ti, t ¢t, ty .. 


x| Xp NX, Xp X, Xp... 


where t,—f,-+ Ak, the constant 4 being the step, that is, the dif- 
ference between two adjacent tabulated values of ¢. Three kinds of 
differences come up in connection with such a table: 

(1) ordinary differences; 

(2) central differences; and 

(3) differences of negative orders. 


1. Ordinary differences 


Let us write the table in the form of two columns for ¢ and « and 
let us leave a blank space between every pair of consecutive 
values. Both columns are placed near the left margin of the page. 
Let us perform the following operation on the entires appearing in 
the column for the function: from each number we subtract the 
preceding number and we write the difference in a third column 
(to the right) on the same level as the empty space between the two 


values whose difference we are taking: 


69 


70 Mathematical Analysis of Observations 


fis X_3 ; 
va, 
g 
fi, “_s x3 
1 
x 
— 4/3 3 x 5), 1 
fly xy xy ae 
1 5 
x 
—1/y 3 wn), \ wy, 5 
ty Xo Xo Xo x5 
i 3 5 
ey, , Xe, xi, 
t May x 1 
1 3 
Xs), a Xap, 
t, X45 x5 
1 
Xi), 


fy X43 


This gives us a column of first-order differences. For these 
differences, we shall use the notations shown in the third column 
of the system: 


1 = _. 
Ho), xX _s Xx _y 


x" 5, =X_1—*X_, ete, 
Here, the subscript is the arithmetic mean of the subscripts of 
those values of the function whose difference we have taken, The 
superscript 1 indicates a first-order difference, We now perform 
the same operation on these entries in the column of first-order 
differences. Thus we obtain a column of second-order differences, 
etc. For the entire fable, we use a single system of notation: 
a difference of order k& is denoted by a superscript kx. A subscript 
denotes the arithmetic mean of the subscripts of those (k — I)st- 
order differences from which the difference in question is formed, 
With such a system of notation, all the odd-order differences have 
fractional subscripts with denominator 2 and all even-order dif- 
ferences have integral subscripts. Differences of consecutive orders 
that appear in a single row have the same subscript. Differences 
of different orders, including the values of the function (which are 
zeroth-order differences), appear in the isosceles triangle with 
gaps (between the rows) in every rowandin every column, Suppose 
that we have a difference denoted by x*, where & is a positive 
integer and s is either an integer (for k even) or a half-integer 
(for k odd), From the definition of a difference, we have the 
following equation relating the difference x’ with the differences 
of next lower order: 
le OY ee AY 

If n+-1 values of a function are given, this table will contain n 
first-order differences, n—1 second-order differences, etc. The 
column for nth-order differences will contain only one number and 
with it the table of differences is completed, 


Interpolation From a Table With a Constant Step 71 


The highest order of the differences in the table is one less 
than the number of values of the function that are tabulated, 


2. Central differences. 


In certain formulas, it is convenient to introduce the arithmetic 
means of adjacent ordinary differences that appear in a single 
column, These differences are called central or centered differ- 
ences, We use the same system of subscripts and superscripts 
for them as for ordinary differences; that is, the subscript used 
with a central difference is equal to the arithmetic mean of the 
subscripts of those differences from which the central difference 
is formed: 


} 
X41) — (x3 + X41) ‘3° 


Here, the central differences of odd order have integral subscripts 
and those of even order have fractional subscripts. This removes 
the danger of confusing central differences with ordinary differ- 
ences because the situation is the reverse with ordinary differ- 
ences; hence we can use the same notation for both without con- 
fusion. Central differences are written in the blank spaces between 
the ordinary differences from which they are formed. 

It is easy to show that the formula giving the relationship 
between the &th-order difference and the differences of order 
k—1 remains valid for central differences of all orders, Suppose 
that x*, is a &th-order central difference defined by the last 
formula given above. When we express the ordinary differences 
of kth order that appear in this formula in terms of the ordinary 
differences of (A — 1)st order, we obtain 


l : - 
ae, = 7 [edn — *ar,) + (any, — %em)h 


If we regroup the terms within the square brackets and if we put 
the factor 1/2 inside, we obtain, from the definition of central 
differences, 


1 k=-1 k-1\ — k-1 

x (x55y, + ep) = Xeni 

I (x#—1 4 k-1) = xk-} 

od 8s 3s) 8 ° 
Therefore, 


ko ee yp k-1 __ k= 
Xai, — *a4i Wa ° 


which is the desired relationship. 


72 Mathematical Analysis of Observations 


The number of central differences in each parenthetical expres- 
sion is one less than the number of ordinary differences of the 
same order (since they appear in the spaces between the ordinary 
differences and since the number of spaces between m points is 
m—}). Accordingly, if the column for the zeroth-order values of 
the function contains »-+ 1! numbers, there will be » zeroth-order 
central differences, The highest order of central differences will 
be n—-1 (there will be one such central difference). If we include 
all the central differences in our table ofdifferences, the isosceles 
triangle will be completely filled out. 

We might note that other symbols and another system of 
notation can be used for indicating ordinary differences. For 
example, 


When this notation is used, differences with the same subscripts 
are written in a single column and all differences are placed in a 
right triangle. This notation for ordinary differences cannot be 
extended to central differences because writing them in a right 
triangle with no blank spaces does not allow the inclusion of 
central differences in the diagram, Therefore, in using this 
notation in formulas containing central differences, we do not use 
special notation for the latter, but simply express them in terms 
of the ordinary differences. This complicates calculations. 


3. Differences of negative orders. 


Of any two adjacent columns in a table of differences of a function, 
the column on the right contains the first-order differences of 
the column on the left. If we consider only this relationship 
between two adjacent columns, it would be possible to adda 
number to all members of the column on the left (the same 
number in each case) without changing the column on the right. 
If we go from right to left across the table of differences, this 
relationship between two adjacent columns is interrupted at the 
column giving the values of the function (that is, the column of 
zeroth-order differences). In certain problems (for example, in 
the numerical solution of differential equations), it is convenient 
to construct other columns to the left of the zeroth-order column 
that contain the numbers for which the values of the function are 
differences. These numbers are called differences of the minus- 
first order. It follows from the definition that one of the differ- 
ences of the minus-first order can be written arbitrarily, for 
example, the first one. It will appear in its column one row higher 
than the first value of the function. If we add this number to the 
first value of the function, we obtain the second value of the 
minus-first-order difference. Addition of this number to the 
second value of the function yields the third value of the minus- 
first-order difference, etc. 


Interpolation From a Table With a Constant Step 73 


After the column of minus-first-order differences has been 
compiled, we may construct minus-second-order differences to 
the left of it, etc. These are denoted by the same system that we 
use for differences of positive order. If the first number in the 
zeroth-order column has subscript s, the first number in the 
minus-first-order column will have the subscript s—'/,; in the 
case of the minus-second-order differences, it will be s—1, etc. 
The subscripts of the last numbers in each column on the left 
will exceed by '/, the subscripts in the neighboring column on 
the right. After a choice has been made for the first number in 
each negative-order column, the remaining differences of this 
order are obtained by successive additions to the differences 
of the next higher order, and therefore the columns of negative- 
order differences are sometimes called columns of sums, 


23. THE BASIC PROPERTIES OF ORDINARY 
DIFFERENCES 


1. The representation of differences of various orders in terms of 
tabulated values of a function 


Consider the table of ordinary differences: 


x, 
t, Xy xs 
x, xy, 
1x, x} xy 
XS), Xi, 
t, Xx, x3 
xs 


This table can be continued both downward and upward—using 
negative subscripts for the values of the function and the differences 
in the latter case. 

By definition, differences of a particular order are expressed 
in terms of the next-lower-order difference, butinsome problems, 
we need to have a difference of arbitrary order directly expressed 
in terms of the tabulated values of the function, Such expressions 
are easily obtained by induction from an examination of particular 


cases, 
For second-order differences, we have 


xi Xj, — XL (x, — x.) — (x, —%*,)=*x,— 2x, +433 


74 Mathematical Analysis of Observations 


and analogously, 


where & is an integer (positive, negative, or zero), Furthermore, 


Xj, XE OQ SX, — 2x, + x, — (4, — 2x, + 4,) = 
= X, —3x, + 3x )—x_,, 


Xj, SX XL SK, — 2x, + x, — (x, — 2x, + %,) = 


© e°© «© © «© © © «© «© © © © © © © © «© «© © © «© © © 


From these examples, it is clear that the relationship between the 
ordinary differences and the tabulated values of the function is 
expressed by formulas with the binomial coefficients: 


2D __ 1 2 
My =Xniy Cy nepar tap th p-2 


oe (KH 1)? CP, + tee Hx 


Yd 


k-p° 
pti 1 
Xe My Me p41 Copst~kep TF Oo nM ey pat —.. 


Dp 
vee + (— Ly Coot (“kg — *4) 1 sr TNR 


(where k is aninteger and p is a positive integer). 

These formulas are suitable only for differences of positive 
order. For negative-order differences, we have no such formulas 
because these differences are defined only up to an arbitrary 
addend. 


2. The inversion of a table. 


Suppose that all the numbers in the table of values of a function 
are written in inverse order and that the differences of consecutive 
orders are defined according to the following principle: each 
number is subtracted from the preceding one. It is easy to see 
that all first-order differences will remain the same in absolute 
value but their signs will be reversed. The second-order differ- 
ences change neither in absolute value nor in sign, This is true 
because the expression 


— _ 
X= Ky — 2h, + Xy_y 


is symmetric with respect to x, and its value is not changed if we 
replace the three numbers «x,_,, x,, and x,,, with x,,,, x, and 


Interpolation From a Table With a Constant Step 75 


Xx-1, respectively. It is clear from the common expressions for 
odd and even orders that all even-order differences remain un- 
changed and that odd-order differences change their sign. It 
follows from this that we must always keep to the chosen order 
of subtraction in calculating differences (that is, we must subtract 
each value from the preceding one), since violation of this order 
is equivalent to reversing the table with a resultant change in the 
sign of the differences, 


3. The sum of differences of a single order. 


If we write all the ordinary differences of a tabulated function with 
n-+-i basic points according to the plan given above, we shall obtain 
an isosceles triangle intersected by straight lines parallel to the 
base. Each of these straight lines contains all the differences of 
a particular order. Let us consider the column of differences of 
order s, If the basic point with subscript 0 appears at the be- 
ginning of the table, the first (upper) number of the column of 


differences will have the subscript < and the last difference will 
have the subscript n—-=, If we add all the differences of order s, 
replacing each of them with the difference between the differences 
of order s—1, we obtain 


n-= n-— 
a) ~ 3 

1 2 8-1 -1 
a x= D(x 1X 1) 


If we expand the right side, we obtain 


8 
a- > 


2 

~ ; s—1 x—1 gs—1 4-1 

Deh (0a et +(e gi 8 oi) +... 
5 n-—— n—-1—-—— -z" n-2-— 

k=— 2 2 | 2 

2 


s—] s~-1 s—1 8-1 
: + (¥sa1,, —_ *s1) + (ext — ¥a21) 
> a 2 T 


3 


All the terms other than the first and the last cancel each other 
out, so that 


~1 
> aa ee 


> 


The numbers on the right represent the first and last numbers in 
the column of differences of order s—1. Thus, the sum of all the 
differences of any order is equal to the difference between the 
last and the first numbers in the column of differences of the 


76 Mathematical Analysis of Observations 


next-lower order. This property is used in certain applications 
and it is also used as a check in setting up a table of differences. 
Although the operations required in setting up these differences 
are extremely simple, nonetheless errors of carelessness are 
possible both with manual and with machine calculations. A check 
should then give exact equality between the sum of all the differ- 
ences of order s and the difference between the top and bottom 
numbers in the preceding column of differences. 


4. Polynomial differences. 


It is easy to show, in the same way as in Section 20, that the 
ordinary differences of nth order are constant for a polynomial 
of degree »« and that the differences of higher order are equal to 
0, Just as in the case of a table with a variable step, this property 
can be used to find the degree of the interpolational polynomial 
for a table with constant step. This degree is equal to the order 
of those differences that are almost constant for the given tabulated 
function. 


5. The effect of error on the differences in a table. 


Let us suppose that an error < is made in the value of the 
‘‘middle’”’ number x, in the table of values of a function; that is, 
instead of x,, the table reads x,+<«. Let us show ina single 
diagram the differences that would appear in an exact table and 
the differences that appear in the table with the error: 


tf | x 
to —~—3h | X_3 
Xs, 
fy —2h|) xX_» x? , 
x) 4), “3 ts 
fo—h | X_, x* +s xt —de 
| x1, +8 ig Te x° ,, +-106¢ 
to |“ote x? —2¢ xi+ 6¢ x5 20€ 
| xl, —é Xi, + 3¢ Xi, lO 
fy xX, a x! — 4¢ 
xy, xy, —e 
ep XQ x) 
xi 


t, | X3 


It is clear from this diagram that an error in one number in 
the table of values of the function has an effect on two first-order 


Interpolation From a Table with a Constant Step 77 


differences, on three second-order differences, etc. In the column 
of differences of a single order, these errors are multiples of 
the original error and their coefficients are the binomial coefficients 
with the degree of the binomial equal tothe order of the difference. 
In even-order differences, the maximum error always appears 
on the same row as the original error, and in odd differences, the 
signs of the error alternate in consecutive differences on both 
sides of this row. 

These differences are used in the following manner in detecting 
errors in a table: in the more common tables of functions that 
are used in the natural sciences, the differences of some order 
or other almost always become sufficiently small.* If there is a 
large error in one of the numbers listed in the table, this error 
is added (with the binomial coefficients and alternating signs) to 
small differences. This leads to wide fluctuations in the values 
of the differences with alternation of signs beginning with the 
difference of some particular order. In such cases, the differ- 
ences are said to ‘‘jump.’’ These jumps in the differences can 
serve to locate the error, It should then be sought on the line with 
the greatest jumps (or the line next to it), An approximation to 
the value of the error is obtained by dividing the jumps by the 
maximum binomial coefficient whose degree is equal to the order 
of the difference. For a jump, we may take the difference with 
the largest absolute value. 

The problem of the influence of errors in the table on errors 
in the differences is rather complicated because in a table almost 
all numbers contain errors in rounding off; each of these errors 
in turn introduces errors in the differences of all orders, and 
these errors add up. An exact calculation of errors of this sort 
would be extremely tedious. Therefore, it is advisable to use the 
probability method. 


Example, Consider a table of values of a third-degree polynomial, Let us compile a 
table of its differences: 


t | * x 
_9|_99 — 22 
| 17 17 
—1|--5 — 12 —5 — 12 
5 +6 5 +7 
0 0 — 0 0 — —4 
—1 +6 0 +3 + 10 
_ 0 0 0 —2 +6 
+1 l - 16 9 49 
9 —2 6 6 —2 +7 
+ “45 5 
$3/43 +3 


Beside it, let us print almost exactly the same table except that we make an error 
of —1 for ¢=+ 1. A jump in the differences is likely in the third-order differences 


*Strictly speaking, it is only to such functions that the principle of point interpolation 
is applicable, For greater detail, see Section 20 of Chapter 5, 


78 Mathematical Analysis of Observations 


(minimum) and becomes noticeable in the fourth-order difference, According to the 
preceding table, an error is possible on the row of the maximum jump, that is, the row 
corresponding to ¢—=-+1, In the fourth-order differences, the maximum binomial 
coefficient is equal to 6 and the error has the same sign as the original error, Assuming 
that those differences at which there are jumps would have to be approximately equal to 
0, we see that the error in the fourth-order difference is equal to -6 and, consequently, 
that the error in the table is equal to -6/6 = -l, 

Finally, it is usually more difficult to determine the error in a table than it was in 
this example, but its position and order can be determined in this manner, 


24. THE METHOD OF CONSTRUCTING 
INTERPOLATIONAL FORMULAS FOR 
TABLES WITH A CONSTANT STEP 


Suppose that a function is defined by a table with a constant step 
and that we wish to calculate the approximate value of the function 
at some intermediary value ¢ of the argument not listed in the 
table by means of an interpolational polynomial. We start with 
that listed value of the argument ‘, that is closest to the given 


number t, This means that |¢—t)|<+4, where / is the step of 


the table. Here, ¢ may be greater than t. Since the values of the 
argument ordinarily increase from the initial value listed to the 
last value listed, we must, in such a case, interpolate forward, 
as it is called (from the initial value). If ¢<%, we interpolate 


backwards, Finally, if |f—f,/—=4, we interpolate at the middle. 


In such an interpolational problem, we thus have two parameters 
t, and hk, The first depends on the given value of ¢ and on the step; 
the second (namely, the step hr) is given by the table. It is natural 
to set up interpolational.formulas in such a way that they do not 
contain these parameters explicitly. This is done quite simply by 
introducing the normalized argument 


tt 
=—. 


Corresponding to the values of the argument 
ee fy— 2h, to— he ty fo th, tp + 2h 
are the integral values of the normalized argument 
..—2 —1l 0, +1, +2,... 
For a table with argument « and values of the function 
aD ar a oor a 7, 7Pae Or 


we construct interpolational formulas that are applicable for all 
values. of t; and #4. For a normalized table, we shall construct an 
interpolational polynomial of the form 


P(t) =a) tat+a,?+ ..., 
that satisfies the conditions of point interpolation: 


... P(—2)=x_,, P(—l)=<«_,, P (0) = Xo, P(I)=-x,, P(2)=x5... 


"+ DIg+ "17+ Et 'DEt om xX 


***+4'0c9+ "p61 4+-Soo+'n= ‘hy 


eee +? pac re 4- rye aba ee °+'’ nol +oogt ort "97+ %o = Sx 


14 
"+4! 9¢+*pg— Hp "9+ pg +6os+°o, + "De lex 


a "DLE= 1x oe "+! Dyl+*og+toga ix cot Prt Or tt fy 4 077 ae ty 


’ 


“ ————_-—- ee ee 
pgrttnge “Ny 1 +'pst oe + og Het iptiotipe ty te ae TaD +i Tom 


x 


ae i DEES EX + epge ix test oe “Dim. 


_ Cf 6 
settagy— toga | x 4 ppd pp —inges ‘lt is ppg tp pet eae 2 fay Sty Sai 7 


COFFE Coo ere e eel ges seesebetoesoroe Ceetttetese 


so8 4 ype! gt DLL + Dy tage FT cet thy mbt lye 7 oe I- 


OOo er edoececcectececttacneenees ay 


“**ngg—Foge IT x "psT—lol+ pe 'v= I 


x 


"*"'p0S+ §0g1 —fng =” x "8+ —ty9 4% og—Sopt ing —'"De’ x 


é 


t;— 
2 pe —'ve6it*og-tpe= Is 7 


eee =—'DIQ +*n17Z—“vG+ ‘np —"D= t— x 


x 


80 Mathematical Analysis of Observations 


These conditions lead to a system of equations for determining 
the unknown coefficients. We shall call these equations the basic 
equations. As was shown in the preceding chapter, a direct solution 
of the basic equations is a tedious operation. We shall set up 
supplementary equations that are consequences of the basic equa- 
tions. To do this, let us write expressions for the differences of 
various orders for this polynomial in terms of its unknown coef- 
ficients. When we equate these expressions for the differences 
of the tabulated function, we shall obtain the extended system of 
simultaneous equations shown on page 79. 

It is clear from the diagram that the first terms are the same 
in each of the columns of the differences of the polynomial, The 
diagram can be made more complete by adding the central differ- 
ences in the spaces between the ordinary differences. We shall 
include only those that we shall use for deriving the formulas of 
greatest use in astronomy. 

Various formulas for the given number of basic points can be 
obtained by choosing in different ways as many equations as there 
are basic points given. (The number of basic points is selected 
by starting with the criterion of near-constancy of differences 
explained above.) 

It is convenient to take one equation from each column of 
differences, including the column of zeroth-order differences, that 
is, the list of values of the function. It should be noted that it is 
always advisable to arrange the interpolational difference poly- 
nomials according to increasing order of difference and not 
according to the degree of the argument. 


25. NEWTON'S FORMULAS FOR INTERPOLATING 
FORWARD AND BACKWARD 


1. Newton's formula for interpolating forward. 


Suppose that we have a table of values of a function. Let us consider 
that part of it beginning with the particular value x, corresponding 
to the value 10. The differences formed from this portion of 
the table form an isosceles triangle inthe diagram of the preceding 
section, To determine the coefficients of the interpolational 
polynomial, we take the equations along the upper side of this 
triangle. (These are underlined by solid lines on the diagram.) 
For definiteness, we shall confine ourselves to the case of the 
four basic points corresponding to four values of «, namely, 0, 1, 
2, and 3. We then have the four equations 


Ag=X, a, t+a,+a, —= x1, 

20,4 64,—=%*}, 6a,;= Noy, 
Let us denote the interpolational polynomial by N*(t), where the 
superscript + denotes forward interpolation. In the present case, 


N* (t) = a) + a,t+ at? + at}. 


Interpolation From a Table with a Constant Step 81 


If we express the coefficients in terms of the differences, we obtain 


1 2 lo: 
N* () =x +2(x,— 5 01 + 5%) + 


+2(5x xi —- ym) +e + xh, 


If we group the right side according to differences of consecutive 


orders, we obtain Newton’s interpolational polynomial in closed 
form: 


t(t —- 1)(t — 2) 


' z(t — 1) 2 
a ap xa). 


N+ (s) =x) wr, + 


This polynomial can be generalized in a natural manner to the 
case of an arbitrary number of basic points: consecutive terms 
contain differences of increasing orders with subscripts that 
increase by 1/2 from one term to the next. They are multiplied 
by consecutive coefficients of the binomial serles. Thus, in the 
case of n+ 1 basic points, Newton’s formula for forward inter- 
polation takes the form 


. u(t —1)(t — 2)... (.—k+1 
N+ (2) = xg-b SERVE NA COBEN he. 
k=1 


2. Newton's formula for backward interpolation. 


Let us now consider a part of the table ending with the value of 
the argument‘. That is, let us construct a formula for the table 
with basic points having the values ...—3, —2, —1!, 0. For 
simplicity, we shall confine ourselves to four basic points. To 
determine the coefficients, we use the equations along the lower 
side of the isosceles triangle constituting the diagram on page 79. 
(These are underlined with the dotted line in the diagram.) 

&y = Xo: 

a, — G+ a,—= xy, 
2a, — 6a, —_— x4, 
6a, = x5). 


The solution of these equations is 


Qo — Xo: 


Let us substitute these coefficients into the interpolational poly- 
nomial, which we shall denote by N (t) (The superscript denotes 
G 


82 Mathematical Analysis of Observations 


backward interpolation.) If we arrange the polynomial first in 
powers of « and then according to increasing order of the differ- 
ences, we obtain 


N7 (1) = Xp + tay, x ct !) xy 4 24 I oer) Xs). 


With backward interpolation, « is always negative because ¢ <fp. 
Instead of <, let us use the absolute value of that number, which 
we denote by «= —v+. We then have 

u(u—l) 2 au(u—1)(u—2) 3 


Xx 


- 1 
N (4) = Xp — ax, + oe ap 


In this form, the formula differs from the formula for N*(t) only 
by the alternation of signs in front of consecutive terms. 

In the case of n+ 1 basic points, Newton’s formula for inter- 
polating backward is of the form 


S -n(u—1)...(a— 
N~ (uw) == %)-+ YH 1)* Bea eee) x ap. 
k=1 


The formula for N (zt) is often called the formula withincreas- 
ing differences and that for N*(t) is called the formula with de- 
creasing differences. Let us note that neither of Newton’s formulas 
changes its structure if the initial value of the part of the table 
that we are examining has a subscript , different from o. In 
that case, we would need to add , to all the subscripts on the 
right side of the formula. 

Example 1, Determine the right ascension of the moon on January 2, 1950 at 0630 
universal time, 


From the astronomical calendar for 1950, we write a, for f= 2,3,4,5,6,7 (th day 
of January) and let us compile a table of the differences: 


t op 


January 9 qi 47ire 48 


3376° 
38 5 43 20 + 111° 
3487 — 998 
4 6 41 27 + 12 4. 78 
3499 — 32 |. O78 
5 7 39 46 — 80 -+- 34 — 128 
6 8 36 45 — 138 + 49 _- 93 
3281 -— 9 _ 8 
7 Y 31 26 — 147 441 
3134 + 32 
8 10 23 40 
38010 — 118 


9 Il 1I38 59 


Interpolation From a Table With a Constant Step 83 


To see if slx basic polnts are enough, let us make a crude estimate of the coefficient of 


the fifth difference in Newton's formula, We have += i 


4° Therefore, the coefficlent of 


the fifth difference is equal to 


a(-a) (-9) (=a) (=) | a.234 


— 120 16-120 ~~" 


We may take this coefficient as being approximately equal to 0,04, and, therefore, the 
term with the fifth difference glves a value greater than 15, 

Since the limlting error of the tableis equal to 0.5%, slx baslc points are not sufficient 
because the term with the sixth-order difference can glve a number that Is greater than 
0.5", Therefore, let us extend the table to include seven and, as a check, eight basic 
points, The coefflcient of the slxth difference is approximately equal to the product 

a) 
0,0+4 a which we may take as being roughly equal to 0,03. The term wlth the sixth 
difference gives a number of the order of 9,4, which Is less than the error of the tables, 

We arrange the calculations according to the following plan: 


Differ= Terms of the 


Coefficients 
_ ences formula 
t= 0.27083 N, = = 0.27083] + 3376* [A, = + 914°.3 
<— 1 = — 0.72917 —F l= — 0.3646 No = — 0.0987 | +111 [4,=— 11.0 
c—2=— 1.729017 |=" = 0.5764 |Ng=+00509] —99 [4a 56 
t— 3 = — 2.72917 4=S — — 0,682 N, = — 6.039 +7 |A=— 03 
r—4=— 3.72917 + —4 __o747 |n,=-+0029 | —27 ‘4,=+ 08 
= on? 5 s| 
t~5=— 472917 ~~ = —0.788 |Ng=- 0.023 | —12 dg=+ 053 
A = + 898°.5 
A = 147589 
ay = 4%47™48 
a = 5h omon 


As an exercise, let us see if lt ls possible in the above calculations to stop at a fairly 
small descending fourth-order difference, An estimate of the coefficient gives 


24 


The term with the fourth difference yields 0,35, which is less than the tabular error, 
This calculation shows that we may not always trust the orlginal estimate, We must 
see whether the smallness of the difference is stable, slnce a small difference can arise 
from the fact that we happened on a slgn change in the differences of the glven order in 
the table, Therefore, when we have obtained a small difference of some particular order 
and have shown that It can have only a small effect on the result, we need to check the 
subsequent terms, In the example that we have been considering, we carrled out our 
calculation up to the seventh-order difference, It ls almost the same as the sixth-order 
difference, Therefore, we may stop at the term with the sixth-order difference, 

It is useful to make the following clarifying remarks about the system of calculations, 
Suppose that we have a complete system of calculation that can be somewhat shortened 
by use of a calculating machine, The factors In the numerators of the coefficient are 
put in the first column, The factors of the entire coefficients appear in the following 
column; these are written In the form 


84 Mathematical Analysis of Observations 


N=, 
l 
Ny = Ni 5; 
; —2 
N, = WN ,) 3 etc, 
From these formulas, we obtain the successive coefficients of the formula: Nj, Ny,.... Ne. 


The coefficients are multiplied by the differences of the same order, We thus obtaln 
successive terms (beginning with the second) of the formula, which we denote by 4, 
Ao, ..-, Age 

The calculations are carried out with a reserve digit (the tenth parts of a second) 
in order to decrease the accumulation oferrors from the individual terms of the formula, 
Therefore, we calculate «= N, with five digits to the right of the decimal in order that 
we may vouch for the tenth parts of a second after we multiply by the four-digit value of 
the first difference, This operation is legitimate since < is given and, consequently, 
this number can be treated as exact, Since the second-order difference has three digits 
and the third-order difference differs only slightly from it, the coefficients of these 
are calculated to four decimal places (with conservation of accuracy), The differences 
of the remaining orders are two-digit, Therefore, itis sufficient to take their coefficients 
with three decimal places, In the fourth column, the successive differences are written 
for convenience, We do not have to complete this column if the rable of differences is 
placed close to the diagram, (if the calculation is done by a technician, it is best to 
write out this column because the column of second factors in the interpolational formula 
is situated along a diagonal line and the column of first factors along a vertical line, 
which can be a source of errors,) The last column contains successive terms of the 
formula, When they are added with the renth parts kept and the sum is rounded off to the 
nearest second, we obtain a number that mustbe added to the original value, The individual 
terms do not have to be calculated in those problems in which a sufficient number of 
basic points are known from the preceding experiment to ensure the necessary accuracy, 
If this number is not known, the last column is necessary since we need to know the 
individual consecutive terms of the formula in order to know the greatest order of the 
differences that have an effect on the result of the interpolation, As was stated above, 
this determines the degree of the polynomial, that is, the number of basic points, In 
the present case, the term with the fourth-order difference does not have an effect on 
the result, but the following term yields almost a unit, Therefore, we still need to take 
the sixth-order difference and consequently seven basic points, 

Example 2, Determine the declination of the moon on December 29, 1950 at 1745 
universal time, For the initial value of the argument, we take the first instant of 
December 30th, which is close to the given instant, In this case, we need to interpolate 
backwards, From the astronomical calendar, we copy the table of values of t and 
38q, and we form a table of the differences as follows: 


t bq 
December 24 -+28°237.0 
— 27’,1 
25 27 55.9 — 80.0 
— 107.1 + 64,0 
26 §=626 8.8 — 74.0 + 3/6 
— 181,1 + 9,6 — 1.5 
27 23 (7.7 — 64,4 +21 + 0’.2 
— 245.5 + 11,7 —1/.3 
28 19 2,2 —52,7 + 0.8 
— 298.2 + 127.5 
29 14 4.0 — 40.2 
— 338.4 


30 8°25/.6 


Interpolation From a Table With a Constant Step 85 


The sixth-order difference is sufficiently small, and we may therefore confine ourselves 
to seven basic points, 

In this case, f — ty = —6"15™ and + = —U,26042, We follow the calculating procedure 
as indicated in example 1}: 


Coefficients Differ- Terms of the 


ences formula 
t= — 0.26042 N, = — 0.26042 — 338.4; 4, = + 887.13 
c+ 1 = 0.7306; 2F* — 03608; Ny=— 0.0963 — 40.2) Ap = + 387 
t+ 2 eo - 
t + 2? = 1.7396; 3 = Q.080; Ng = 0.0559 +- 12.5; ds ne 0.70 
c+t3 , 
++ 3 = 2.7396; 7 = 0.68; N, = — 0.04 + 08; 4,;=+ 0.03 
:+4= 373%; *T+=075, M,=—008 —1.3; d;—=+ 0.04 
. t--0 . 
++ 5 = 4.7396; 6 = 0.79; Ng =—0.02 + 0.2, d4g=— 0.00 
iy = 8°25.6 A= +913 
S=1 313 
6 = 9°56’.9 


26. STIRLING’S FORMULA 


The line containing the initial value and the ordinary differences 
of even order with subscript o will be designated as the central 
line of the table. Let us fill out the table with the central differ- 
ences of odd order that appear on the same line and let us take 
the equations located on the central line. If we confine ourselves 
to five basic points, that is, if we consider only the fourth-order 
differences, we obtain the following system of equations for 
determining the coefficients of the polynomial: 


a, ~ 


— yl 
a, + 4, = Xo 


2a, + 2a,= xi, 
6a, = x3, 
24a = x! 


a, =X 
— yl l x 
a, =%)_ & a? 
] 9 ] 4 
a,= 7 %— 94 *o 
l 3 L 4 
a, 5B *o ay 24 9 


The interpolational polynomial is of the form 


i 1 ls ] 
S(t)=x,+2(45— gi) +e (5 x5 Oy xf) <3 GZ xXottM az Xie 


86 Mathematical Analysis of Observations 


If we expand it in differences of consecutive orders, we obtain 
Stirling’s formula in its usual form: 


c(t? onl x t2(t2— 1) x 


S@=x, +7 x + 5 +a 0 FE 

When a complete study of the principle of point interpolation is 
made, an odd number of basic points isalways chosen for Stirling’s 
formula, namely, the initial basic point and the basic points that 
are symmetric to it preceding and following it. Therefore, 
Stirling’s polynomial always is of even degree, The law of formation 
of the consecutive terms of the polynomial, beginning with the 
third, is as follows: a coefficient of an even-order difference is 
obtained from the preceding coefficient (of an odd-order difference) 
by adding 1 to the argument of the factorial in the denominator 
and by multiplying the numerator by +. The coefficient ofa 
difference of odd order, let us say of order 2-1, is obtained 
from the preceding difference of odd order by adding 2 to the 
argument of the factorial in the denominator and by multiplying 
the numerator by the factor <*—&?, For example, the fifth- and 
sixth-order differences are of the form 


a(t? — 1) t(t® — 1) (1? — 2%) 


a . 5 
(3-4 2)! I Xo» 


t(v2— 1) (13 — 22) ¢ pera 2 2( 7° =I) (= 2%) 


(5+1) 0 6! 


The law of formation of the coefficients in Stirling’s formula is 
such that the coefficients do not change uniformly. For example, 


° (t* — 22) xat? = 


for *= > the consecutive coefficients are equal to 


d ] 3 1 
16° 128’ 256° 1024’ °°" 


The signs of the coefficients alternate in pairs; that is, there 
are two positive terms, two negative, two positive, etc. Each 
coefficient of an odd-order difference, beginning with the fifth, 
is greater than the preceding coefficient of an even-order differ- 
ence. Therefore, in using Stirling’s formula one should not stop 
at an odd-order difference since the corresponding terms can be 
greater than the preceding one if the differences decrease slowly. 

Furthermore, if Stirling’s formula is cut off at an odd differ- 
ence, it will not represent exactly the highest or lowest of the 
basic points used. 

In practice, however, the formula is often cut off at an odd 
difference since an arbitrary coefficient of an even difference 
is much less than the preceding coefficient of an odd difference. 
For example, if the term with the fifth difference gives five units 


of the last digit, then, for «— x the term with the sixth difference 


yields less than one half a unit of the last difference and, conse- 
quently, it can have an effect only on the reserve digit and cannot 


Interpolation From a Table With a Constant Step 87 


change the result by more thana unitof the last digit (if the reserve 
digit is considered in all terms). 

Because of the symmetry of Stirling’s formula about the line 
passing through the initial basic point, this formula can be applied 
without change in forward interpolation («> 0) and backward inter- 
polation (1 < 0). 

Example, Determine the right ascension of the moon on December 13, 1950 at 0936 
universal dme, 
For the initial instant, we take December 13 at 0000 universal time, Let us write the 


value of a at three preceding and three succeeding instants and let us draw up a table of 
the ordinary differences: 


t a 
December 10 17/#41'"575 
4141* 


HL 18 50 28 — 205° 
3906 — 111° 
12 19 55 34 — 316 4- 978 
3590 — 14 — 30° 
3260 + 53 — 45 - 
14 21 49 44 — 277 + 22 
2983 + 75 
15 22 39 27 — 202 
2781 
16 23 25 48 


Let us now write the central differences of odd order on the line passing through the 
origin, We carry out the calculation of the coefficients in Stirling’s formula 


S(t) = Xy + S,x9 + S2X5 + S3x9 + S4x5 + S5X6 + S6X5 


by use of the equations 


13 — | t 

S1=% S= Siz Ss=Si1—g—» Su =Saqy 
73 — 4 t 
Ss = S3 50 ° Seo= 53 


We indicate the calculations that we must do for the interpolation in the following table: 
Differ- Terms of the 


Coefficients ices formul a 

+ = 0.40000 z = 0.1000 S;= 0.40000 +3425" + 1370.0 
22 = 0.1600 5 = 0.2000 S.= 0.0800  —330 — 26.4 
12 — ] = —0,8400 (t3—1):6 =—0.1400 S;=— 0.056 + 20 — 11 
y2—4 = — 3.8400 (s?— 4):20=— 0.192 S,=—0.006 +67 — 0.4 
2:6=0.0607 S,=+0.001 —38 — 0.4 
Se == +0.001 —15° ay 

ag = 2075524" 

A = + 1342° 


A x 22028 a = 21717468 


88 Mathematical Analysis of Observations 


Here, A denotes the sum of the terms in Stirling’s formula that contain differences from 
the first to the sixth order, This number must be added to the initial value a, to obtain 
the desired value of the right ascension, 


27. BESSEL'S FORMULA (TWO VARIANTS) 


1. The first variant. 


The value of the argument that is given for interpolation is included 
between two tabulated values that are denoted by ¢, and 4, 1Az=14, 
in the diagram of Section 24. Let us take as a starting point for 


the calculation the average value of the argument F (to +t) Ac- 
cordingly, we introduce the argument 

, tt 1 

nn ek 


which takes the values 


at the basic points. Here, ¢,—¢,-+-kh, where the & are negative 
and positive integers. 

We limit ourselves to the four basic points ¢_,, ¢, ¢,,and ¢,; 
that is, we shall construct a third-degree interpolational poly- 
nomial of the form 


B(2") = by + byt! 4+ bot”? + byt” 


Let us set up a system of equations analogous to that used in 
Section 19, but with argument +’ (see page 89). 

Let us take the equations situated on the middle line between 
the basic points ¢, and ¢): 


1 l 
bo Ghee Ob +b x}, 26, = x} 6b, = x3 


Ha” 1." Vy" 
Solution of this system ylelds 


_!l 45 _! 1 1 os 2 
bs =| Xi,» b,= 7% Xi), b= 4, oy Fy by = Xy — “3 Xi, 


When we substitute these coefficients into the polynomial 


B(x’) = by 4-00! + bot” + dat”? 


Wet Vet q+" 


9 +892 + 9 = "tx 


qa ig 7 tig 40g te 


tig = ‘hx gg = ex 


e998 27% 1 17% of — oy 
a I 9S q 


&g Pt Bp 1g me Ny 
Ip 1% I= 1 


e, 8 <P 4 1 
I2 er 9 


b 
€ 


90 Mathematical Analysis of Observations 


and then expand it in differences of increasing orders, we obtain 
Bessel’s formula 


9 ‘ l 
an x! (x? — +} 
B (t’) = XK, 4 <x}, + —— ‘, -+ 5 Xi), 


xX? 


This formula can quite easily be generalized to the case of an 
arbitrary even number of basic points. For example, a fifth- 
degree interpolational polynomial is of the form 


<7 0,25 _, 
B (+) = Xi), +X), + oP Xi), + 


c’ (1 — 0,25) (2? — 0,25) (t’? — 2.25) 
Fp, ap 
v/ (x/* — 0,25) (t’* —2.25) 
5! la" 


—- 


The law of formation of the coefficients is quite simple: a coef- 
ficient of an odd-order difference is obtained from the preceding 
even-order difference by adding 1 to the argument of the factorial 
in the denominator and multiplying the numerator by <’.. The 
coefficient of an even-order difference, let us say of order 2k, is 
obtained from the coefficient of the preceding even-order difference 
by adding 2 to the argument of the factorial in the denominator 
and multiplying the numerator by 


72 (2k — 1)3 
* 4 


We note that, in the general formula, the free terms of the bi- 
nomials in the numerator are of the form 


1 38 9 5 2 73 49 
; — 3Y 


etc, 


4? o4 7 47° 4 


It is easy to write the general expression for the terms with 
even- and odd-order differences, The coefficient of x?* is equalto 


(+ —=)(* —+) vee p27 ST 
> 


the coefficient of x}i*! is equal to 
(24) (v2)... [ve Aa) | 


It is clear from these formulas that Bessel’s formula is 
especially convenient for t’—0, that is, for p= OE, In this 


case, the interpolation is known as ‘‘interpolation on an average.’’ 
The coefficients of all odd-order differences in Bessel’s formula 
then vanish and the number of remaining terms is halved, with the 
result that the number of computations and the resulting error 


Interpolation From a Table With a Constant Step 91 


are decreased. The formula can be applied without change for 
close to f)(z’<0) and for ¢ close to ¢,(*’> 0). For example, if 
t=, -+0.25h, we shall have x’ = — 0.25, andift =/, — 0.254 = +t, + 0.754, 
we shall have +’ = +-0,25. 


2. The second variant 


Let us substitute * — 0.5 for <’in Bessel’s formula; that is, let us 
measure the normalized argument from the initial basic point. In 
the coefficients of the differences of successive orders (other than 
the first) there are factors of the form 


v 
4 


Rewriting these factors in terms of the argument <, we obtain 


. Im —1/? 
cE mt tem (2m) (tem — 1). 


Bessel’s formula with the new argument is written as 


\ ] t(t—1) ., 
Bs) =x, +(t 3) AS + 


l 
a(t —1l)[t— = 
as ( 7) x}, GPU sG TW Ne—*) xi 


G+1sG—DE—2(s -5) 


o! 


x3, 

This formula can also be obtained from the diagram of Section 
19 if we delete the equations situated on the middle line between 
the initial basic point and the subsequent basic point. It is sug- 
gested that the reader do this himself and thus check the formula 
just given, 

We note a modification of the second variant of Bessel’s 
formula that is often used in practice. Let us make the simple 
transformations 


1 
2 7 


1 
ry eee BAN 
If we replace the first term in Bessel’s polynomial with its value 
given by this equation and if we reduce the terms, we obtain the 
following formula, which differs from the above only in the first 
two terms: 


| > 
B(t) =x,+* . xi, + se) x, 


Example, Determine the right ascension of the moon on December 13, 1950 at 0936 
universal time, This instant of time lies between the tabulated instants 1300 and 1400, 
Let us write the tabulated values symmetrically about the middle line and let us calculate 
the differences that we need for interpolation: 


92 Mathematical Analysis of Observations 


t a 
December 11 1859779288 
3906° 
12 19 55 34 — 316° 
3090 — 14’ 
13 20 55 24 — 330 4. 678 
21 22 34 32608 —304% +538 +44 — 458 
14 21 49 44 — 277 + 225 
2983 + 75° 
15 22 39 27 — 202° 
27818 
16 23 25 48 
When we determine t and <’, we get 1=0.4 and t'’ = 0,4—05—=—0.1, The calcula- 
tions can be made as indicated below: 
Coefficients vier Terms of the 
=—01 By=—O01  +3260* A, = —326.0 


2/* — 0,25 = — 0.24 Bo = — 0.12009 —304 Ag = + 36.5 
e’:3=— 0.0333 B,;=+0.004 +53 A, = + 0.2 
a’? 9,95 = — 2.24 By= +0022 +4. 44 4,=+ 1.0 


(</* — 2.25): 12 = — 0,187 B, = -— 0.0004 —45° A, =-+ 0.0 
t’:5=—0.02 A = — 288* = — 4’"488 
a = 21717468 dy == —- 21722'"348 


In this example, calculations by using Bessel’s formula turned out to be somewhat 
simpler than those using Stirling’s formula, 

it should be noted here that the coefficient B. is somewhat greater than 8, and that 
the coefficient B, is five times as great as Bs, It can be shown that the coefficients of 
odd-order differences are always much smaller than the coefficients of the immediately 
preceding even-order differences, From Bessel’s general formula (first variant), we have 


Bomsy tT 
Bom om +. 1 ’ 
so that 
Bom+1 < l 
Bom 2(2m+1)' 


since {t’|<0,5, (We cannot have equality because +t’=0.5 indicates that the tabulated 
value of the argument is taken,) Furthermore, 


2 (2m—1) 
Bom __ 4 
Bom =| a 2m’ 


Let us denote this ratio by the letter u. It is a function of t’ and depends on the param- 
eter m, Since 


12 (2m — 1)? 
du * tT 4 - 
dv’ mt"? 


Interpolation From a Table With a Constant Step 93 


If t’ is 
at t’==— 0,5, we have the 


for all values of +, the ratio u increases monotonically for all values of +’, 
nonpositive, it takes values from -0,5 to 0, Therefore, 
minimum value of u: 


1 (m—1y 
4 q 
uo, == ——___________ = m — ]; 
—m 


for t=0 u =o. 


If +’ is nonnegative, u will be negative and will take values from —oo to minus 
|m—1]|, Consequently, the ratio of the coefficient of an even-order difference to the 
preceding coefficient exceeds m— 1 independent of the value of +’, 


28. GENERAL REMARKS ABOUT THE APPLICATION 
OF THE INTERPOLATIONAL DIFFERENCE FORMULAS 


If the given value of the argument is close to the lowest value 
tabulated, we may take only Newton’s formula for interpolating 
forward since use of the other formulas requires knowledge of 
basic points preceding the one taken as the initial basic point. 
For the same reason, we can use only Newton’s formula for inter- 
polating backward if the given value of the argument is close to 
the highest tabulated value. However, if the given value is such 
that the value taken as initial lies in the interior of the table, 
we may choose any of the difference formulas. 

We mention two criteria that are used in regard to this. The 
simplest criterion has to do with the values of the coefficients 
of differences of successive orders in the different formulas, 
For a given value of ¢ and the corresponding value of +t, the most 
suitable formula is the one with the smallest coefficients of the 
differences, since the unavoidable errors that occur on rounding 
off the table and the differences are thus decreased, For use in 
connection with this criterion, we give a table for the different 
interpolational formulas: 


Orde Interpolational formula 
of Newton’s Bessel’s 
differ (forward in=| Stirling’s (first 
ence terpolation) variant) 
| 0.1 | — 0.045 + 0.005 — 0.045 
0.2 | —0.080 + 0,020 — 0.080 
2nd | 0.3 | —0.105 + 0,045 —0,105 | 
0,4 — 0,120 + 0.080 — 0,120 
0.5 — 0,125 + 0.125 — 0,125 
0.1 + 0.0285 — 0.0165 +- 0,006 
0.2 | +0,0480 | —0.0320 | + 0.008 
3rd 0.3 +- 0,595 — 0.0459 + 0.007 
0.4 +- 0.0640 — 0.0560 +- ().004 
0.5 + 0.0625 — 0.0625 + 0.000 
ee 
0.1 — 0.0207 — 0.0004 0.0078 
0.2 — 00336 — 0.0016 +. 0.0144 
4th | 03 | —0,0402 | —0,0034 | +4 0.0193 
0.4 — 0.0416 — 0.0056 + 0.0224 
0.5 — 0),0391 — 0.0078 + 0.0234 


94 Mathematical Analysis of Observations 


As was stated above, it is sufficient to have values of « given 
from 0 to 0.5 since, for values of « exceeding 0.5, we need only 
take the opposite initial value. Only the absolute values of the 
coefficients are necessary, though we included the signs as well. 
Comparison of the coefficients shows that, in all cases, the 
coefficients in Newton’s formula are at least as great as the 
coefficients in the other two formulas and in several cases they 
are greater. 

Another criterion as to the relative values ofthe interpolational 
formulas is the estimate of interpolational error, This estimate 
depends on the properties of the functions being interpolated, on 
the number of basic points, and on the values of the argument, 
The combinations of basic points are different for the different 
formulas used. The maximum absolute value of a derivative of 
order n-+1is obtained in different regions of the argument, 
Therefore, somewhat different estimates of the interpolational 
error are obtained from the different formulas. If it is possible 
to make a comparison of the estimates of the interpolational 
errors obtained from the different formulas, the formula with 
the smallest estimate should be chosen. 

In conclusion, we shall give a brief summary of the rules for 
applying the formulas given in this chapter for point interpolation. 


1. We choose an initial value ¢, such that |f —t)| <A. If 


t>t,, we interpolate forward; if ¢ << %, we interpolate backward. 

2. We compile a table of differences and determine the degree 
of the interpolational polynomial from the order of the differences 
that remain approximately constant. 


3. We calculate the argument ra! = 


if we are using the 
formulas of Newton or Stirling or the second variant of Bessel’s 
formula and +t’ = t+ if we are using the first variant of Bessel’s 


formula, 

4, We then calculate the consecutive terms of the chosen 
formula until we reach the terms that no longer have an effect on 
the result. If the table contains m digits to the right of the 
decimal, the consecutive terms are often counted with 4+ 1 digits. 
The extra term is kept while we are adding but is discarded when 
we round the figure off after the addition is performed. 


Part Ill 


PROBABILITY THEORY 


Chapter 7 
RANDOM EVENTS; BASIC CONCEPTS 


29. RANDOM EXPERIMENTS 


The concept of probability is introduced in order to study those 
experiments for which it is impossible to accurately predict the 
results of future observations, even when the conditions under 
which the experiment will take place are known. 


Examples, (a) Because of the imperfections of our instruments and various other 
random causes, random errors appear in all types of measurements, When we begin 
to take measurements, we cannot tell in advance how great the errors will be, nor can 
we give sufficiently narrow bounds within which the error will lie, 

(b) The owner of a lottery dcket cannot predict his winnings before the drawing, 

(c) lf a star is chosen at random from a list of stars, one cannot predict its basic 
characteristics, 

(d) lf a person is chosen at random from a group, for example, a group of recruits, 
we cannot say in advance what the color of his eyes will be, 

(e) lf a box contains white and black balls and one of them is drawn, we cannot say 
without looking what color this ball will be, 


Experiments such as these are said to be random. 

When we consider the results that can be expected in a random 
experiment, it often happens that there will be not one definite re- 
sult but several that can occur. Here, the a priori considerations 
(based on experience) depend to some extent on the conditions under 
which the experiment will take place. 


Examples, (a) If it is known that the absolute value of the random error incurred 
when a measurement is made with an angle-measuring instrument does not exceed 10', 
the question as to the magnitude of the error that will be made in a future observation 
may be answered, for example, as follows: errors with absolute values from 0! to 2° or 
from 2' to 4', or from 4! to 6', etc, are possible, 

(b) If a star of a certain spectral class is picked from a list, we may give approxi- 
mate possible bounds for its absolute value that are narrower than they would be if we 
had no information on the spectral type, 

(c) lf a box contains three white balls and two black ones, when a ball is drawn, we 
may say that the possible results are (1) a white ball or (2) a black one, 

(d) lf a coin is tossed, we may get either heads or tails, 

(e) lf a die is thrown, the following results are possible; 

1, 2, 3, 4, 5, & 


Multiple observations of random experiments give certain 
practical criteria by means of which(in view of the known conditions) 


" 97 


98 Mathematical Analysis of Observations 


we may expect some of the possible results to be more likely than 
others. 

Experience with measurements indicates that small random 
errors are encountered more often than large ones. Because of 
this, we feel that it is more likely that we shall get a small error 
in our measurements than a large one (provided all the necessary 
technological conditions are met). In all such cases, the words 
‘‘possible,’’ ‘‘probable,’’ ‘‘almost certain,’’ etc., are used in 
practice to give a qualitative idea of the degree of certainty of some 
result of a future observation. 

When we characterize the ‘‘degree of certainty’’ quantitatively, 
we agree in advance to consider only those random experiments 
that satisfy the following conditions: 

(1) The number n of possible results of an observation is finite, 
Let us denote these results by the letters 


Ay, Ag oss Ags Agays sees An 
We shall call any possible result of a random experiment an out- 
come, In particular, the results A, are outcomes. 


The word ‘‘outcome’’ should be understood in a rather broad sense, If we are con- 
sidering various forms of precipitation, rain will be an ‘‘outcome’’ and the absence of 
any precipitation at all will also be an ‘‘outcome,’’ If we are considering the possible 
physical characteristics of a randomly chosen star, its belonging to any one particular 
spectral class is an outcome, ‘The various possible wave lengths are also outcomes, 
To understand this last example properly, itshould be borne in mind that we are speaking 
of the possible physical characteristics before the star is investigated (Le., the ‘‘experi- 
ment’’ is the act of investigating), 


(2) The outcomes listed (that is, the outcomes 4,,...,4,) 
constitute a complete set of outcomes; that is, at least one of them 
is certain to happen.* 

(3) The outcomes 4,,..., A, are mutually exclusive; that is, if 
one of them happens, the others cannot happen. (The term ‘‘pair- 
wise exclusive’’ is also used.) 


If the answer to the question in example (a) above as to the magnitude of the absolute 
value of the error is that an error with absolute value from O'to 5' or an error from 5S! 
to 10' are possible, these outcomes are mutually exclusive if we make an agreement as 
to which of these possibilities represents the case in which the absolute value is exactly 
5), 


(4) The outcomes 4,,...,4A, are equally likely. We shall con- 
Sider the concept of equally likely outcomes as basic, that is, re- 
flecting the properties of the experiment. The use of this concept 
in various problems is motivated primarily by the symmetry of an 
experiment with respect to the outcomes. In certain problems, the 
assumption of equal likelihood is a hypothesis concerning the 
properties of the experiment, whichcan be subject to verification by 


*“The phrase ‘‘these outcomes are the only ones possible’’ is also used in the litera- 
ture to indicate that the set is complete, 


Random Events; Basic Concepts 99 


observation. It is only in the very simplest cases that we can 
easily show whether the assumption of equal likelihood of the 
enumerated outcomes in a list similar to that given in paragraph 1 
is admissible. 


To illustrate, let us suppose that a die is thrown, Any one of the six faces from 1 to 
6 can appear, If the die is a perfect homogeneous cube, then throwing the die is a 
random experiment, The entire set of outcomes consists of the appearance of faces 
with 1, 2, 3, 4, 5, and 6 spots, These outcomes are mutually exclusive and, because of 
the symmetry of the faces with respect to the center of gravity about which the die 
rotates when it is thrown, these outcomes are equally likely, If the die consisted of a 
wooden part and an iron part, the center of gravity would be displaced toward the iron 
part, and if the die were to be thrown, it would naturally be more likely that a number 
on a wooden face would appear, The assertion of equal likelihood of the faces would not 
be justified once we know the composition of the die, since there is no symmetry about 
the center of gravity, If we had no such information, we could initialiy make the hypoth- 
esis of equal likelihood for the numbers 1, 2, ...., 6, For definiteness, let us suppose 
that the single dot is placed on an iron face and that the 6 is on a wooden face, If the die 
is thrown many times, the 6 will be thrown many more times than the 1, And we should 
have to reconsider our hypothesis of equality of likelihood of all the numbers from 1 to 6, 

Let us consider a second example, which we shall frequently encounter in what 
follows, A coin is tossed, The question is which side will appear on top, There are only 
two possible results—heads and tails—and they are mutually exclusive, If we assume 
that the coin is an ideal thin homogeneous disc, we can consider these two outcomes as 
being equally likely, since there is no reason to suppose that one side is in any way 
different from the other or that one has a better chance of appearing on top, 

It is much more difficult to establish the equality of likelihood of natural phenonema, 
From a formal point of view, in considering certaln outcomes equally likely, we intro- 
duce supplementary assumptions into the problem that are not always strictly jusdfiable 
from a physical point of view, 

Let us consider another example, Three white balls and two black ones are in a box, 
Let us suppose that these five balls are indistinguishable to the touch and that they have 
been thoroughly mixed, Without looking into the box, we take out the first ball that we 
touch, Before looking to see, we ask what is the color of the ball that has been drawn, 
There are two possible answers—white and black, These two outcomes are mutually 
exclusive since, by hypothesis, we take out only one ball, We cannot, however, call these 
outcomes equally likely since there are more white balls than black ones, To get equally 
likely outcomes, let us number the balls sothat numbers 1, 2, and 3 are white and 4 and 5 
are black, We can now list the outcomes differently: Outcome (1) corresponds to ball 
number 1, which is white; outcome (2) to ball 2, which is white; (3) to ball 3, which is 
white; (4) to ball 4, which is black; and (5) to ball 5, which is black, These five outcomes 
are the only possible outcomes; they are mutually exclusive; and they are equally likely 
since the balls are assumed to be all just alike, 


30. THE CLASSICAL DEFINITION OF PROBABILITY 


The first problem in probability theory is that of explaining the 
concept of probability, that is, of a number characterizing the 
degree of likelihood of a particular result of a future observation. 

Let us suppose that the outcomes representing the possible out- 
comes of an observation are 


A,, Ay ee. Ans Aggy oes An: 


v 


Suppose that these are the only possible outcomes, that they are 
mutually exclusive, and that they are equally likely. The result 
whose probability we are seeking to determine we shall call an 
event, 

Suppose that C is an event that may or may not take place in 
future observations. Suppose also that, among the enumerated 


100 Mathematical Analysis of Observations 


possible outcomes 4A,, A,, ...,A,, we Single out those outcomes un- 
der which the event will take place. Suppose that these are the out- 
comes 4A,,..., A,, where k<n. We shall call these outcomes 
favorable for the event C. 

We give the following definition: 

The ratio of the number of outcomes that are favorable for a 
given event C to the number of equally possible mutually exclusive 
outcomes representing the only possible outcomes is called the 
probability of a random event: 


p=PC)=4, (7.1) 


To calculate the probability, we must first enumerate all the 
possible outcomes and then single out those that are favorable for 
the event. Let us note that in certain problems all the possible 
equally likely, mutually exclusive outcomes can be listed dif- 
ferently. For example, the list of outcomes that may occur when a 
die is thrown can be given in the following two forms: 

(a) An odd number (1, 3, 5); an even number (2, 4, 6). 

(b) The numbers 1, 2, 3, 4, 5, 6. 

We now give some consequences of the definition of probability. 

1. Suppose that k= 0. Then, by definition, p= 0. Since there 
are no outcomes that are favorable for this event, the event is im- 
possible. Thus, the probability of an impossible event is 0. 

2. Suppose that k—=n. On the basis of the definition, we have 
p= 1. Since all the outcomes are favorable for the event, it will 
certainly take place. Such an event is said to be cervtain, Thus, 
the probability of a certain event is unity. 

3. In the general case, 0<k <n. Consequently, 0<p< 1. The 
probability can assume values from 0 to 1. According to the 
Classical definition, p can take only rational values. 

4. If k is the number of outcomes that are favorable for an 
event, the number of unfavorable events is n»— kz. If we denote by 
q the probability of the event not taking place, we obtain from the 
definition 


(7.2) 


since the unfavorable outcome are favorable for the non-occurrence 
of the event. Adding equations (7.1) and (7.2), we obtain 


p+q=l1. (7.3) 


31. EXAMPLES OF THE CALCULATION OF A PROBABILITY 


Example 1, A coinis tossed, Determine the probability of its falling heads, 

The list of the only possible, equally likely, and mutually exclusive cases is (1) 
heads or (2) tails; that is, n= 2, Of these two possibilities, the favorable outcome is 
heads; that is, k= 1, Therefore, the probability p = 1/,, 


Random Events; Basic Concepts 101 


Example 2, A die is thrown, Determine the probability of getting a five, The only 
possible, equally likely, and mutually exclusive cases are 1, 2, 3, 4, 5, and 6, Only the 
Single outcome of the die falling on five is favorable, Consequently, p = 1/9, 

Example 3, A die is thrown, Calculate the probability of getting an odd number, 

Method 1, The only possible, equally likely, and mutually exclusive outcomes are l, 
2, 3, 4, 5, 6, Of these, the outcomes 1, 3, and 5 are favorable, Therefore, p = 3/, = 1/,, 

Method 2, The two outcomes of an even number being thrown and an odd number being 
thrown are exhaustive, equally possible, and mutually exclusive, Only the second case of 
the odd number is favorable, Consequently, p = 1/,. 

Example 4, Two coins are tossed, Determine the probability of getting heads on each, 
We make up the table 


Ist coin heads heads tails talls 
2nd coin heads talls heads talls 


It is clear from the table that k= 1 and n= 4, Therefore, p =!',, 

Example 5, A box contains three white and five black balis, The balls are mixed up 
and they are indistinguishable to the touch, Without looking into the box, we take out two 
balls at random, Calculate the probability that both of them will be black, 

The number of all possible outcomes is the number of ways in which we can take two 
balls out of a set of eight; that is, n= C:. The number of favorable events is the num- 
ber of ways in which we can take two black balls from a set of five black balls; that is, 
k = C%. Therefore, we obtain 


Example 6, There are twelve white and eighteen black balls in a box, We take out 
ten balls, Calculate the probability of taking out exactly four white and six black balls, 

The total number of cases is the number of ways in which it is possible to take ten 
balls out of a set of thirty; that is, n= Cao The number of favorable outcomes is de- 
termined as foliows;: the number of ways in which it is possible to take four white balls 
out of a set of twelve is Cj,. To each set of four white balls, there corresponds ig SCX= 


tuples of black balls, Therefore, the number of favorable outcomes is Ci, Coe The 
probability of taking four white and six black balls is : 


Ci, Ci, 495» 18.564 

— ~12 718 —z 0,30 

p= cio 0,045,01 6 
30 


This problem is an example from sampling theory, In various fields, we have 
occasion to deal with the following problem, Suppose that we have a complete set of 
objects (the ‘‘general set’’), We wish to find the characteristics that can describe it to 
some degree, 

Examples: (a) For the set of all stars, assorted according to spectral class, find the 
percentage of spectrally double stars, 

(b) For the set of small planets, determine the average declination, 

(c) Determine the percentage of rejects in an order of identical wares, 

To solve the problem completely, we would have to examine the entire set, which 
could be an extremely tedious procedure and sometimes impossible (for example, de- 
termining the percentage of unexploded artillery shells), In such cases, we take some 
portion of the entire set and determine the desired numerical characteristics of it, The 
problem then arises as to what the probability is of getting from this sample a value of 
the characteristic that is close to the characteristic of the entire set, This example is 
one of the problems of sampling theory, The general set is the collection of balls ina 
full box in which the ratio of the number of white balls to the number of black balls is 
two to three, The question then is what is the probability of making a sampling of ten 
balis and obtaining the same ratio, that is, of judging the composition of the full box 
from the sample, 


It is clear from examples 4, 5, and 6 that the determination of 
the total number of events and the number of favorable events is 
not always so simple as in the first three examples. In particular, 
if in the fourth example there were five coins instead of two, the 
total number of events would be thirty-two and the direct listing of 


102 Mathematical Analysis of Observations 


them would be tedious. Therefore, one of the problems of proba- 
bility theory is to derive rules for calculating probabilities of 
certain events from the known probabilities of other events. Such 
rules will be given in the following two sections. Another, extremely 
important, example is the problem of establishing the conditions 
under which the probability will be close to 1 or to 0. 

This problem is connected with the assumptions, since the event 
can be considered almost certain if its probability differs only 
slightly from unity and almost impossible if its probability is close 
to zero. 


32. A THEOREM ON THE ADDITION OF PROBABILITIES 


THEOREM. The probability that one or the other of two mutually 
exclusive events C and D will take place is equal to the sum of the 
individual probabilities of these events, 

Proof: Suppose that the list of exhaustive, equally likely, and 
mutually exclusive events is of the form 


Ay, Ag ees Ay Agate sees Anat Apazai sees Ap: 
Suppose that 
A,, Ao, oe ey A, 


are the only cases that are favorable for the event C. Since the 
events C and D are mutually exclusive, none of these & cases can 
be favorable for the event D. Let us suppose that A, .., Arioy . - 6 5 
A,,, are all outcomes that are favorable for the event D. We denote 
the probabilities of the events C and D by P(Cc)andP(D). Of all n 
outcomes, & are favorable to C and / are favorable for D. There- 
fore, 


P(Cy=4, P(D) =<. 


The cases A,. As, .--, Ay Axa +--+, Anup taken together are favor- 
able for either the event C or the event D. Therefore, 


P(C or py sti! 


When we compare this last equation with the preceding two 
equations, we obtain* 


P(C or Dy == 4 =, 


*In some text books, the occurrence of the event C or the event D is denoted by 
C+D _ (symbolic addition), and the theorem on the addition of probabilities is written 
in the form 

P(C + D)= P(C)+ P(D) 


if C and D are mutually exclusive, 


Random Events; Basic Concepts 103 
or 
P(C or D)==P(C)+P(D). (7.4) 


COROLLARY 1. If the events C,, Cz, ..., Cs are mutually ex- 
clusive and their probabilities are p,, po, ..., De, then 


°?%7 


P(C, or Cy, ..., or C,)==p,tpot ... +D,z. (7.5) 


COROLLARY 2. Let C denote the occurrence of any one of a 
set of mutually exclusive events C,, C,...,C,. Let us agree to 
call the individual events C,,...,C, mutually exclusive ways in 
which the event C can happen. Then, the theorem of addition for s 
events can be formulated in the following manner: 

The probability of the event C, which can happen in the different 
ways Cy, ..., C,, is equal to the sum of the individual probabilities 
of each of these ways, 

COROLLARY 3. Suppose that in a certain experiment, some one 
of the mutually exclusive events C,, Co, ..., Cy must necessarily 
occur, Suppose that their probabilities are respectively equal to 
Pi, p2, eoey Pn. Then, 


Pit pet... +p,a=l. (7.6) 


Proof: From the addition theorem, the probability of any one of 
the mutually exclusive events C,, C,,...,C, is equal to the sum 
of the probabilities of these events and, by hypothesis, one of these 
events must certainly take place. 

Formula (7.3) is a special case of formula (7.6). For the event 
either will take place (we denote the probability of this by p) or it 
will not take place (we denote the probability of this by 9). These 
results are mutually exclusive. Therefore, if we apply formula 
(7.6), we can write p+g=1. 


Example 1, A box contains five white, seven green, and eight red balls, Determine 
the probability of taking a ball at random and getting either a green or a red one, 

The probability of getting a green ball is 

P (green) = 7/20 
The probability of getting a red ball is 
P(red) = 8/20 
From the addition theorem, we have 
P (green or red) = 7/20 + 8/20 = 15/20 = 3/4 
This result can be derived by a simple calculation, Since we are determining the 
probability of the event ‘“‘getting either a green or a red ball,’’ the number of favorable 
events isk =7-+8 = 15 and n= 20. Therefore, 
P (green or red) = 15/20 = 3/4 


In this example, it would have been possible not to give the number of balls; the prob- 
abilities would have been sufficient, 


104 Mathematical Analysis of Observations 


Example 2, A die is thrown, Determine the probability of getting either a five or an 
even number, 
From the results of example 2 of the preceding section, we have 


P(S) = 1/6 
It is easy to see that 
P (even nurnber) = 1/2 
Since getting a five and getting an even number are mutually exclusive events, 
P (five or even number) = 1/2 + 1/6 = 2/3 


Example 3, A die is thrown, Determine the probability of getting either a five or an 
odd number, 

The addition theorem is not applicable here because these events are not mutually 
exclusive, However, since the first event is a particular way of getting the second, we 
have 


P (five or odd number) = P (odd number) = 1/2 


33. THE THEOREM ON THE MULTIPLICATION 
OF PROBABILITIES 


Definition 1. An event is said to be composite if it consists of the 
occurrence of two or more events. These events are called the 
components of the composite event. It is sometimes said that ‘‘a 
composite event is a combination of events.’’ (The word ‘‘com- 
bination’? is used somewhat broadly; the events referred to may 
take place at different times or in different places.) 

Definition 2, The probability of an event C calculated with the 
assumption that an event D has taken place is called the conditional 
probability of the event C given D and is denoted by P(C\D). Proba- 
bilities of the form P(C) are sometimes called unconditional. 

THEOREM. The probability of a composite event representing 
a combination of two events is equal to the product of the proba- 
bility of one of them multiplied by the conditional probability of the 
other given the first, 

Proof: Suppose that the entire set of equally likely and mutually 
exclusive outcomes is of the form 


Aj, Agr sees Age Aggy cess Anapepy eee Ane 
Let us suppose that only the first & of these outcomes are favorable 
to both the components C and D of the composite event in question. 
Suppose that the outcomes 4A,, A,,..., A,,, represent all the out- 
comes that are favorable for the event C. Among the outcomes 
Ansi4i9 «+», A, there may be events favorable for Dand unfavorable 
for C, but we are not concerned with them. 
Let us write the decomposition in the following manner: 


Favorable for c 


a 


Ais Apr sees Ay Aggie seer Agat Anstey «ees An 


Favorable for Unfavorable for D Unfavorable for C 
both C and D 


Random Events; Basic Concepts 105 


By the definition of probability, we have 


P(C) =*#!, P (CandD) = =. 


n 


To calculate the probability of D given C, we reason as follows. 

If the event C has taken place, this means that one of the out- 
comes A,, A, ..., Ax,; has taken place; therefore, the number of 
all the outcomes is equal to &+J/. Among these, the outcomes 
Aj, Az, ...,» A, are favorable for the event D. Consequently, 


R 


P(D|C) = RT" (7.7) 

It is now easy to verify the theorem:* 
P(C)P (DIC) = =" 5 = = =P Canad), (7.8) 
P (CandD) = P (C) P (D|C). (7.9) 


COROLLARY 1. If we switch the notations for the events, we 
obtain 


P (DandC) == P(D) P(C|D); (7.10) 


Since P(D and C) = P(C and D), we have 
P(C) P(DIC) = P(D) P(C|D). (7.11) 


This formula establishes a relationship between two unconditional 
probabilities (P(C) and P(D) and two conditional probabilities of 
two events. We see that only three of these can be given arbitrarily. 
The fourth is determined from the formula. However, even the 
three probabilities cannot be given altogether arbitrarily since 
their values must not be such as to make the value of the fourth 
greater than unity. 

COROLLARY 2. The multiplication theorem for several events 
can be written in the following form: 


P (C,andC,and. . .andC,) = 


12 
= P(C,)P (C2|C,) P(Cs|C,, C2)... P(ClC,, Cp «. andC,_1), (7.12) 


where P(C,| C, and C.) denotes the probability of the event C, given 
C, and C,, etc. 


COROLLARY 3 (division of probabilities). From the formula 
P (CandD) = P(C) P(D|C) = P(D) P(C|D) 


*A composite event (combination of events) is often denoted symbolically by a multi- 
plication sign; that is, instead of writing ‘‘C and D,’’ we write C x D, 


106 Mathematical Analysis of Observations 


it follows that 


P(CandD 
p(Dic) = Sane?) pci) = “Sa (7.13) 


that is, the conditional probability of one event given another event 
is equal to the quotient obtained by dividing the probability of the 
composite event by the probability of the second event, 

Definition 3. Events are said to be independent if the probability 
of each of them does not depend on whether the remaining events 
have taken place or not. 

If this condition is not satisfied, the events are said to be de- 
pendent, Formally, the condition of independence of two events can 
be written in the form 


P(C|D) =P (C), P(D|C) =P (D) 


Here, it follows from the definition of conditional probability that the 
second of these equations is a consequence of the first. 

The multiplication theorem proven above is applicable to de- 
pendent events. In the case of independent events, the multiplication 
theorem can be worded more simply: 

The probability of a composite event is equal to the product of 
the probabilities of the components of that composite event: 


P (C,and C,and. . .andC,) == P(C,) P (C2)... P(C,). (7.14) 


The validity of the theorem is clear from the fact that, from the 
definition of independence, the probability of the event C, does not 
depend on the occurrence or non-occurrence of the event C,. 
Therefore, 


P (CyandC,) = P (C,) P (C,|C,) = P (C,) P (C,). (7.15) 


The proof for an arbitrary number of component events proceeds 
analogously. 


Exampie 1, Two coins are tossed, Determine the probability of getting heads both 
times, 

Getting heads on an individual coin is an independent event since the probability of 
getting heads on the second coin obviously does not depend on whether the first coin feli 
heads or tails, In each case, the probability of getting heads is 1/2, By the multipli- 
cation theorem, the probability of getting heads both times, that is, the probability of an 
event representing a combination of two events, can be written in the form 


P(heads and heads) = P(heads)- P(heads) = 1/2+ 1/2 = 1/4 


Example 2, A box contains five white and four black balls, Suppose that you plan to 
take out one bali, look tosee whatcolor it is, put it back, mix the balls up again, and again 
take out a ball, What is the probability (before the experiment) of getting biack balls both 
times, 

The two events of getting a black ball the first time and getting a black bali the second 
time are, because of the way in which the experiment is conducted, independent, Therefore, 


P(black and black) = P(biack)- P(black) = 4/9 + 4/9 = 16/81 


Random Events; Basic Concepts 107 


Example 3, A box contains five white and four black balls, This time, the plan is to 
take out two balls one after the other without putting the first one back, Determine the 
probability of getting black balls both times, 

Here, the probability of getting a black ball the second time depends on whether the 
first ball was white or biack, In the first case, this probability is equal to 3/8, and in the 
second, it is equal to {/2, Our two events are dependent, and therefore, 


F (first black and second black) = F (first biack) + © (second black| first black) 
or 
P (first black and second black) = 4/9 « 3/8 = 1/6 


Here, P(second black|first black) denotes the probability of getting a black ball the second 
time, given that the first ball was black, 


34. TOTAL PROBABILITY HYPOTHESES 


Suppose that an event C can occur when one or several of the 
mutually exclusive random Conditions /7,, i/,,..., H, are satisfied. 
These conditions are called hypotheses, Since the hypotheses are 
random, we must give their probabilities P,. P,,..., P,. The 
list of hypotheses must be complete. Therefore, from the addition 
theorem, 


>, Py =. 

kw] 
Let us suppose also that the conditional probabilities of an event are 
given for each of the hypotheses. We denote them by », =P(C|/;,), 
for k=1, 2,...,7. 

The total probability of an event C underthe stated conditions is 
the probability that the event will take place, computed under the 
assumption that first a particular one of the hypotheses is satisfied. 

To calculate the total probability, we reason as follows: the 
event C can take place inany ofthe ways H,«C fork=1,2,..., 
n, that is, the composite of the hypothesis H,.and the event C. The 
probability of each way in which C can happen can be calculated 
from the multiplication theorem for dependent events since the 
probability of the event C depends on which of the hypotheses is 
satisfied. We obtain 


P (Hy, & C) =P (H,) P (ClA,) = Pu pr- 


The ways H,x*C in which C can occur are mutually exclusive. 
Therefore, from the addition theorem, 


P(C)= 3 P(A, x C) 
Col 
The total probability can be calculated from the formula 
P(C)= JB Prpy- (7.16) 
kel 


To make clear the derivation of the general formula, we repeat 
this reasoning for a particular case. 


108 Mathematical Analysis of Observations 


Example, Suppose that we have three red boxes, each contalning four white and flve 
black balls, and seven green boxes, each containing two white and three black balls, 
All the boxes are just alike except for color and so are all the balls, The balls are 
thoroughly mixed up, We take a ball out of the first box we put our hands on, The con- 
ditions of the experiment are such that we cannot say which of the boxes this will be, 
Determine the probability of getting a white ball, 

To solve this problem, we should first note that the event in question, getting a white 
ball, can happen in two mutually exclusive ways since a white ball can be taken from any 
of the green boxes and from any of the red boxes, Therefore, from the addition theorem, 


P(white ball) = P(whlte ball from green box) + P(white ball from red box), 
Getting a white ball from any of the green boxes ls a composite event consisting of two 
events since for it to happen lt is necessary that (1) we choose a green box and (2) we 
take a white ball from it, The same is true with regard to getting a white ball from a red 
box, From the theorem on multiplication (for dependent events), we obtain 

P(white ball from green box) = P(green box) P(white ball|green box), 
P(white ball from red box) = P(red box) P(white ball| red box), 
Here, P(green box) is 7/10 and P(redbox)is 3/10, P(white ball| green box) is the proba- 
bility of taking a white ball given that we have chosen a green box, and P(white ball| red 
box) is the probability of getting a white ball given that we have chosen a red box, Since 
P (white ball| green box) = 2/5 
and 
P white ball|red box) = 4/9 
the desired probability of getting a white ball is equal to 
P (white ball) = 7/10 - 2/5 + 3/10° 4/9 = 31/75 


The solution can be written in the following table: 


Conditional 

probability Probability 
of getting a of choosing 
white ball such a box 


To solve this problem, we did not actually need to know the composition of the boxes, 
It would have been sufficlent to know the probability of drawing a white ball given that we 
have chosen a red box (or a green one), Also, Instead of knowing the number of boxes, it 
would have been sufficient to know the percentage that were red and the percentage that 
were green, We note that it would not have been correct to calculate the overall number 
of white or black balls in all the boxes and then divide the number of white balls or black 
balis in all the boxes and then divide the number of whlte balls by the number of balls of 
both colors, that is, to use the definition of probability directly, This can be done only 
when we are equally likely to draw one ball as another, In the present case, the pro- 
portion of white balls is somewhat greater In the red boxes than in the green boxes, 
Therefore, we cannot assume that the balls are evenly mixed or that we have an equal 
chance of getting each of the balls, 


35. A PRIORI AND A POSTERIORI PROBABILITIES 
OF THE HYPOTHESES 


Suppose that the occurrence or non-occurrence of an event is con- 
nected with certain assumptions as to the causes of that event or as 


Random Events; Basic Concepts 109 


to the conditions under which it will take place. To each of these 
assumptions there corresponds a definite probability of the event. 
Furthermore, the assumptions (hypotheses) are also considered 
random, Consequently, for the problem to be defined, the proba- 
bilities of the hypotheses must be given. For example, if we take a 
ball out of a box containing five ballsand if we know only that there 
are white and black balls in the box, there are numerous assumptions 
(hypotheses) that we could make as to the composition of the box. 
To each of these hypotheses there willcorresponda definite proba- 
bility of getting a white ball. In this example, we have the following 
Hypotheses and the corresponding probabilities of getting a white 
ball. 


P (white) 
1) 4white 1 black 0.8 
2) 3white 2 black 0.6 
3) 2white 3 black 0.4 
4) lwhite 4 black 0.2 


If we take a ball and it turns out to be white, the question then 
arises as to which of the above hypotheses is the most likely one. 
Solution of problems of this kind can be of value in judging the 
probability of random causes of an event that occurs in an experi- 
ment when there are several such causes. The possibility of 
solving such a problem (when properly stated) is clear from the 
following simple considerations. 

In the present example, if it were not stated that there were 
black and white balls in each ofthe boxes, we would have to add two 
other hypotheses: 


5) Five white and 0 black, 
6) 0 white and five black. 


The drawing ofa white ball would then show us that the probability of 
the sixth hypothesis ig zero. 

Probabilities of hypotheses in the conditions of a problem before 
an experiment is made are called a priori probabilities, and proba- 
bilities that are calculated after an experiment resulting in some 
event are called a posteriori probabilities. 

Let us now consider the following problem. Let us suppose that 
an event C can be explained by # mutually exclusive exhaustive 
hypotheses: 


H,, Hy, «1... Hy. 
Let us suppose that the probability of each hypothesis is known from 


experiment. We denote these a priori probabilities by P,, P,,.., 
P,, AS we know, 


n 
> P,=t. 
k=1 


110 Mathematical Analysis of Observations 


Finally, let us suppose, that, for each of these hypotheses, we 
know the probability of C under the condition that the occurrence 
of C is explained by that hypothesis. We denote these probabilities 
DY Py Poy «ee, Pn It follows that p, is the conditional probability 
of the event C, under the assumption that the cause corresponding 
to the hypothesis H,(for k= 1, 2,..., 7) was in factthe one which 
resulted in the occurrence of the event. Whatwe wish to determine 
now is the probability of each of these hypotheses, if it is known 
that the event occurred. 

Let us solve this problem. The conditional probability of the 
hypothesis H, (for k=- 1, 2,...,) under the assumption that the 
event C has taken place can be calculated from the division theorem 
(7.13): 


P(CX Hx) 
PUM IC) = se 
where P(C XH,) is the probability of the combination of the two 
events, namely, C and H,, that is, of the occurrence of C asa 
consequence of the cause corresponding to the hypothesis H;,. To 
calculate P(H,IC), we must calculate .°(C x H,) and P(C). 

The first of these can be determined from the multiplication 
theorem for dependent events if we consider as the first event the 
occurrence of the hypothesis H,: 


P(C X Ay) = P (Ay) P (ClWA,) = Pape. 


The probability of the event C is, from formula (7.16), equal to the 
total probability of the event C: 


P (C) => PsP: 
S=i1 
therefore, 


P(Hy\C) = EPR; kN ee 


D» Peaks 


s=1 


(7.17) 


From this formula we can calculate the a posteriori probabilities 
of the hypotheses, that is, the probabilities given that the event C 
has occurred. 
The set of formulas (7.17) is usually known as Bayes’ theorem. 
COROLLARY. If the a priori (pre-experiment) probabilities of 
the hypotheses are unknown, the hypotheses should be considered as 
equally likely (in the absence of information). Consequently, 


P,=P,=... =P, =—. 


Random Events; Basic Concepts WW 


Then, from Bayes’ formula, 


> Ds (7.18) 


Sel 


Thus, if the hypotheses are equally likely before an experiment, 
the (a posteriori) probabilities of the hypotheses after the experi- 
ment are, if the event has occurred, proportional to the conditional 
probabilities of the event (these conditional probabilities being 
calculated under the assumption that satisfaction of the correspond- 
ing hypotheses resulted in the occurrence of the event). 


Example 1, Suppose that a box contains four balls, We know nothing of the content of 
the box except that there is at least one white and at least one black ball in it, We take a 
ball out, and it turns out to be white, Then we replace it, Find the probable composition 
of the box on the basis of this result, 

From the condition of the problem, the following assumptions are possible for the 
composition of the box, 


I, Three white and one black, 
ll, Two white and two black, 
lil, One white and three black, 


Since there are no other conditions, these three assumptions can be considered as equally 
likely, Denoting the probabilities of these hypotheses by P,,P,, and P,, we have 


i 
3 e 


The probability of getting a white ball under hypothesis | is 3/4; under I, itis 1/2; 
and under lll, it is 1/4, From Bayes’ formulas, we obtain the probabilities of the 
hypotheses after the experiment 


P (I, black|white ) = —————______ = — = — 


3.1 1,1 1 ~ 17? 
SB 4Ts ItT3 a «F 
1] 
ayy 
P (11, black| white) =—— =a, 
2 
id 
3°44 


P (II, black] white ) = 


| l } 
check yrgats=l 


We see that the result of the experiment compels us to change our judgment as to the 
probabilities of the hypotheses, Before the experiment, they were assumed to be equally 
likely, but after the experiment, they have different probabilities and the most probable 
one turns out to be the first, 

Example 2, Let us consider a variation of example 1, Suppose that before the experi- 
ment we know only that the box cancontain white balls and balls of different colors, Then, 
our table of hypotheses is as follows: 


I, Four white balls and no ball of another color, p; = 1, 
il, Three white balls and 1 ball of another color, p2= 3/4, 
III, Two white balls and 2 balls of another color, p,= 1/2, 


112 Mathematical Analysis of Observations 


IV, One white ball and 3 balls of another color, Pj= 1/4, 
V. O white balls and 4 balls of another color, p; = 0, 


We take the a priori probability of each of these hypotheses to be 1/5, 
Suppose that we then take a ball and that it turns out to be white, Then, the a posteriori 


probabilities of these hypotheses, calculated by means of Bayes’ formulas, are 


P (I |white ) = —,—}— — = + =04 
3 1 
; 4 ; 2 
P (ll |white ) = = 03; P (ill | white )=— = 0.2 
1 
. 4 . 0 
P (IV |white y= = 01; PCV \white ) = — = 0. 
25 25 


As we would expect, the fifth hypothesis is, on the basis of the result of the experiment, 
impossible, 


Chapter 8 
THE PROBLEM OF REPEATED TRIALS 


36. STATEMENT OF THE PROBLEM AND DERIVATION OF 
THE BASIC FORMULA 


Definition 1. If, in the case of a random experiment, a random 
event C may or may not occur, we shall call the establishment of 
conditions under which we can determine whether the given event 
has occurred or not a trial, 

Every observation that is actually carried out is a trial. 

Definition 2. Repeated trials in each of which the random event 
C may or may not occur are said to be independent with respect to 
the event C if the probability of the event’s occurring is, in the 
case of each trial, independent of the results of other trials (that 
is, if this probability is independent of how many times the event 
has occurred in other trials). 


Example 1, A coin is tossed one hundred times, The probability of getting heads on 
the 9lst trial is 1/2 independent of whether the coin has landed heads 0 times, 10 times, 
or even 90 times in the previous 90 trials, 

We note that this example contradicts the general impression, lf a coin has landed 
heads 90 times, a player, getting ready to toss the coin the 91st time, is likely to ex- 
pect the coin to fall tails this time rather than heads, in other words, to assign a greater 
probability to tails than to heads, The lack of foundation of this view can be easily 
seen, As one author commented: ‘‘A coin has no memory,’’ Therefore, when the coin 
is tossed the 9lst time, the probability ofits landing heads depends only on the properties 
of the coin, just as in the case of any other toss, As we shall see later, the situation 
here is somewhat more complicated, The probability of the coin’s landing heads 90 times 
in a row is very small, and the probability of its landing 91 times in a row is also very 
small, but this has to do with the probability of the set of results and not with the 
probability of the result of any individual experiment, which is always 1/2, independent 
of the results of the preceding experiments, 

We note that the very small probability of the coin’s landing heads 90 times in a row 
(which we obtain under the assumption that the probability of its landing heads is equal 
to 1/2 in each single experiment) may compel us to doubt the validity of this assumption 
if the coin does actually fall heads 90 times in a row, For example, it is possible that 
for some reason or other, the coin is so lacking in symmetry that the probability of its 
landing heads in any particular toss is close to unity, However, we may not assume this 
assertion as valid, since an event of very low probability can nonetheless happen, It is 
only when a series of experiments (for example, with 100 tosses of the coin in each) 
invariably lead to analogous results (a marked preponderance of heads) that there is a 
basis for reconsideration of the assumption of equal probability of the coin’s landing 
heads or tails in a single toss, 


I 113 


114 Mathematical Analysis of Observations 


The problem of repeated trials is formulated as follows: 
suppose that vn trials are performed, all of them independent with 
respect to the event C. Suppose that the probability of the event is 
equal to a constant p in each trial. Calculate the probability of the 
event’s occurring & times (where0 < k < n);the order of occurrence 
and non-occurrence is immaterial. 

We shall derive the basic formula for solving this problem. 
Let us denote by C the occurrence of the event in any trial and let 
us denote by C non-occurrence of the event. The probability of ¢ 
we denote by g. Then, 


p+q=1. 


Consider one of the possible sequences of occurrences and non- 
occurrences of the event in 7 trials in which the event C occurs ek 
times: 


cC,C,C,6,6,¢6, C,....,€,C. 


In this listing, the symbol C should appear & times and the symbol 
C should appear n—e& times. If we denote this sequence by the 
letter A,, by the theorem on multiplication of probabilities of in- 
dependent events we have 


P(A) =p-+p-9°9-°G° ped +++ 9° q=prqr*. 


Clearly, the probability of any other sequence of results such that 
the event occurs & times is also p*q"-* since the corresponding 
product of the probabilities differs only by the order of the factors. 
The probability of the event occurring & times (in any order), 
which we denote by P,, ,,, is calculated from the theorem on addition 
of probabilities: 


m 
Pin x P (Ag) = mp*q"—", 


where m is the number of possible sequences. This number is the 
number of ways in which we can write the letter ‘‘C’’ & times in 
n places. Therefore, m is equal to the number of combinations of 
rk. elements that can be taken from a set of n elements: c*. 

We thus obtain the basic formula 


If n and & are not small, we may use tables of binomial coefficients 
and factorials to calculate P, ,. 


Example 2, Suppose that a die is thrown five times, Determine the probability of a 1 
being thrown three times, 
In this case, 


The Problem of Repeated Trials 115 
hence 


3 (1\8/5\2 5-4 1 295 125 
P = ¢-(5) (=) a ee oe 
3,5 5 1-2 DIG 36 3888" 


37. THE PROBABILITY DISTRIBUTION FOR THE NUMBER 
OF TIMES THAT AN EVENT MAY OCCUR 


If an experiment is performed n times, before the experiment we 
can only say that an event will occur 0 times, 1 time, 2 times,..., 
or n times. If we set k=0,1, 2,...,> in formula (3.12), we 
obtain 


Pun=4 ’ 


n(n—1) _y 
Po n= a PQ" Tynes Prin = Np"—3q, Prin=p". 


Py, npg"; 


(8.2) 


It is easy to see that these probabilities represent the consecutive 
terms of Newton’s binomial expansion: 


(G+ py = q? + upgr! +... +p”. (8.3) 


The probability that the event will occur times is equal to the term 
of the expansion containing p*. Therefore, the table of values for 
P,,nfor all values of x is called the probability distribution, 

Since p+ ¢= 1, it follows from the last equation that 


PontPunters +Pan=l. (8.4) 


The same equation follows from formula (3.4) since the outcomes 
represented by no occurrence, one occurrence, two occurrences, 

- , nm occurrences of the event exhaust all the possible cases and 
one of them must come about. 


Example 1, Suppose that p= 0,4, that g = 0,6, and that n = 5, Consequently, g may 
assume the values 0, 1, 2, 3, 4, and5. We compute Py, ,, and present the probability of 
disaribution in the form of a table: 


k 0 1 2 3 4 s) 
Pyin 0.07776 0.25920 0,34560 0,23040 0.07680 0.01024 


For a check, we add up all the numbers Px, n. The result is unity, as we would expect, 


The probability distribution can be represented on a graph. On 
the abscissa, we lay off numbers from 0 ton representing the 
possible numbers of times that the event may occur and we lay off 
the probabilities on the ordinate. It should be noted that the graph 
consists of individual points since the number of occurrences is not 
a continuous variable but takes only integral values from 0 to n. 
In such problems, we say that k assumes discrete values (in the 
present problem, positive integers). Figure 1 shows the graph of 
the example considered above. The same graph may be constructed 
in a slightly different manner by letting the abscissa represent not 


116 Mathematical Analysis of Observations 


the number of occurrences but its ratio to the number of trials, 
kin, and letting the ordinate represent the products of the proba- 
bilities multiplied by the number of trials (Fig. 2). 


Fig, 1. The probability distribution in the 

problem on repeated trials, In this figure, 

the number of trials n = 5 and the proba- 
bility of the event p= 0.4, 


Fig. 2, The probability distribution in the problem on re- 
peated trials, The number of trials = 5 and the proba- 


The points on the graph correspond to various numbers of 
times (from 0 to n) that the event occurs. The points are joined by 
dashed rather than solid lines to emphasize that the graph consists 
of individual points. 

This method of construction is convenient in thatthe base of the 
graph is always equal to a unit of length on the abscissa and the 
ordinates will not be too small. Having constructed a graph by the 


The Problem of Repeated Trials W7 


second method (Fig. 2), let us lay off on both sides of the base 
(which is a segment of unit length extending to the right from the 
coordinate origin) a segment of length 1/n and let us construct a 
continuous broken line as shown in Figure 2. It is easy to show 
that the area bounded by this broken line and the part of the 
abscissa that it intersects is equalto unity. Since this area is equal 
to the sum of the areas of two triangles and ntrapezoids, we have 


1 fl 1 ] 
S= [5 0P ont yz A(PontPu nba a(PintPendt os 


} 1 
: +a t(Pa-yatPn, + ZAP n| = 
=PontPint ree Pan=l. 
Example 1, Show the probability disuribution when there are five trials if p = 1/2, 


R 0 1 2 3 4 § 
1 5 10 10 5 1 


Example 2, Show the probability distribution when there are four trials if p= 2/3. 
Taking the fourth power of 1/3 + 2/3 according to the binomial theorem, we obtain 


Peg 37 BT BL BI 81 


We shall now demonstrate certain general properties of the 
binomial probability distribution just obtained. Since this dis- 
tribution is discrete, the usual analytical techniques are not ap- 
plicable for studying P, , as a function of &. In order to carry out 
our investigation, we find the ratios of the probabilities of adjacent 
values of k: 


Pin Pon P3, n Prin Praisn Pn-1.40 Pain 
_ y;, ~~) © © @ eee Se ~~  — e 
Po. nr’ Py, n’ Po, nr Prein , Prin nam-2,7n , Pn-1, 1 


If we use the basic formula to calculate the general term of this 
sequence, we obtain 


Preisn _ 2—k 
Phin k+1 


Q | 


It follows from this expression that the terms in the sequence of 
ratios decrease monotonically with increasing k. Therefore, we 
note the following basic cases: 


1. The First Ratio Does Not Exceed Unity. 


n this case, all the remaining ratios are less than unity: 


Po, , Prt n Pain 
—_ l, —'" < j, er ey —— l, oe ey ——_—"_—_ l. 
S Pi, Pr n < Pa-in < 


118 Mathematical Analysis of Observations 


Therefore, 
PonaPinl> Po n> 7 ee > Paewn> P,, n° 


(la) If P,,/Po2,< 1, the binomial distribution probabilities 
decrease monotonically and the largest probability will be for 
k= 0; that is, the most likely number of times that the event will 
occur is 0. The graph of the distribution for this case is shown in 
Figure 3 (1a). 

(1b) If the first ratio igs equal to unity, the first two probabilities 
in the binomial distribution are equal to each other and the 
succeeding probabilities decrease monotonically. Here, the two 
most likely numbers of times that the event will occur are 0 and l, 
and these have equal probabilities. The graph of this distribution 
is shown in Figure 3 (1b). 


2. The First Ratio Is Greater Than Unity And The Last Is Less Than 
Unity. 


In this case, because of the monotonic decrease of the ratios, 
somewhere in the sequence there must be one jump from a ratio 
exceeding unity to a ratio less than unity. Since the probabilities 
are discrete numbers, their ratios are also discrete. Therefore, 
there may be two subcases: 

(2a) Some ratio is greater than unity and the following one is 
less than unity. 

(2b) One of the ratios is exactly equal to unity (in which case 
the preceding one is greater and the following one is less than 
unity). 

* Let us consider these subcases separately. 

(2a) We have 


Prouns, Pa, n > 1; Pett <1, aot n <1, a 
an 


—— Py, n Pxr-i,n oein 
hence, 
+» Pye n <Paoiin < Pan > Prtin > Pose, ws 


From these inequalities, it is clear that the probability distribution 
has a maximum at n= -«. To decide what this most likely number 
of times of occurrence, x, is, we replace the probabilities in the 
two inequalities 


Pan > 1, Petim < | 
n 


Pxr-1,1 Pe or) 


by the expressions given for them in the basic formula. Then, we 
obtain 


n—x+1 p n—.x p 
eg eT g SI 


The Problem of Repeated Trials W9 


fa) p= 04 


Pan 
2 3 4 3a) p-Q9 
p k 
AN 
ib) peb2 


0 1 2 3. 4 


k Psa 
p 
“ 2) p=04 b) p-08 


k 

3 
O 1 2 3 4 

kt 


© 
— 


© 
a 
nN 
OW 
aN 


© 
om 
~w 
Od 
nen 


Fig, 3. Forms of the probability distribution in the problem of 
repeated trials for n= 4 and various values of p, 


k 0 ] 2 3 4 


la 0,66 0,29 0,05 0.0036 0,0001 
Ib 0,41 0,41 0,15 0,03 0.0016 
2 0,13 0.35 0.35 O15 0,02 
3a 0.0016 0,03 O15 0.41 0,41 
3b 0,0001 0,0036 0,05 0,29 0,66 


or 
xp+xq<npt+p. xp+xqg>np—4y. 
Since p-+q= 1, we have 
np—qcx<ap+p. 
The difference between these bounds for x is equal to 


np p—(np—q=p+q=1. 


120 Mathematical Analysis of Observations 


Therefore, if ny -+p (and hence np—q) is a fraction, there will be 
only one integer between them. It will be equal to the integral part 
of np +p, and this is the most likely number of occurrences. The 
graph of such a distribution is shown in Figure 3 (2). 

Under the hypotheses of subcase (2a), that is, under the as- 
sumption of a unique maximum in the distribution, the numbers 
np+p and np—g are fractions. If np-+p is an integer, np — q will 
also be an integer (1 less than 1 +-p) and the problem would seem 
to have no solution, but, as we shall soon see, this is not the case. 

Note that subcase (la), in which the probabilities decrease 
monotonically, would satisfy the inequalities obtained above. For 
if 


f>nptp>0 nap—q<od, 


then, x= 0, 
Thus, we obtain a simple condition for a maximum at the 
beginning of the distribution: 


1 q 
Pati or nav 


Let us examine subcase (2b). Here, we assume that 


Px,n ~ztin — | Paya, n 
Do Pe > Prano 


Hence, 


° <Poin < Pan =Parin> Priazn > ne 


Thus, there are two adjacent numbers, « and «+1, of times that 
the event may occur which have probabilities greater than both the 
numbers that are less than x and the numbers that are greater 
than x+ 1. The distribution has a dense maximum; that is, two 
adjacent ordinates appear as maxima on the graph (see Fig. 3). 
Since the entire pattern of distribution that we have described 
follows from the fact that one of the ratios is equal to unity, this 
is the basic equation for finding the most likely numbers of times 
of occurrence in this subcase. We have 


so that 


x=np—q x+1l=np+p. 


Since the number of times of occurrence must be an integer, it 
follows from these equations that we obtain subcase (2b) if np +p 
andnp—gq are integers. 


The Problem of Repeated Trials 121 


Case (1b) is included in these equations. For if np —q==0 and 
np + p==1, it follows that x =0Oand «+1= 1. 


3. The Last Ratio is Equal to or Greater Than Unity. 


In this case, 


Since the ratios decrease, all the other ratios are greater than 
unity. Therefore, 


Pon<Pin<Pan< see < PaeinSPna.n- 


(3a) If P,in<Paa , we have a monotone increasing proba- 
bility distribution and the most likely number of times that the 
event will occur is nx (which is the number of trials) (See Fig. 3). 
This subcase is included in the equations for subcase (2a). For if 
np+p is a fraction greater than nv and if np —q <n, we have x =n. 
and, consequently, 


n p 
PO UTT or na: 


(3b) If P, 1 ,= Pan, the distribution at the end has two adjacent 
identical maxima, which will occur if 


n 


PTET 


(see curve 3b, Fig. 3). 

Subcase (2b) is included in case (3b). For if np+ pn, then 
np—g==n—1. Thus, the conditions for case (2) can be considered 
the most general cases since they automatically include cases (1) 
and (3). From what has been said, we have the following rule for 
determining the most likely number or numbers of times that the 
event will occur. 

Compute the value of 


np + p. 


If this number is a fraction, the integral part of it is equal to the 
most likely number of times of occurrence. If ny+-p is an integer, 
the numbers »ap-+-p and np+p—1 (or np+ p and np— q) represent 


122 Mathematical Analysis of Observations 


the common probability of the two most likely numbers of times 
that the event will occur. 

The most likely (most probable) number of times of occurrence 
is the number of times whose probability is greater than the 
probability of each other single possible number of times of 
occurrence. Note that it does not at all follow from this that it is 
likely that the event will take place exactly the most probable 
number of times. On the contrary, in almost all problems, it is 
less likely that the event will occur exactly that many times than 
that it will not occur exactly that number of times. 


Example 1, Determine the most likely number of occurrences when n= 15 and 
p=2/3, We calculate the bounds between which the most likely number of occurrences 
lies: 


2 2 
np—q=97, np+p= 103. 


Since the bounds are mixed numbers, we have only one most likely number of occurrences: 


2 2 
1057 > x > ae 


Consequently, x = 10, 

Example 2, Determine the most likely number of occurrences when 2=9 and 
p= 0.6, 

We calculate the bounds 


np+p=6, np—q=65., 


Since these bounds are integers, we have twomost likely numbers of times of occurrence, 
namely, 5 and 6, 
In this example, let us calculate the probability of one of these most likely numbers: 


9-876 
Py y= T?2:3-4" (0,6) . (0,4) =~ 0,25. 


It is clear from the last example that the most likely number of umes of occurrence 
has a small probability even when the number of trials is relatively small, 


It is thus more probable that the number of occurrences will be 
some number other than 5 than that it will be 5 (probability of 0.75 
to 0.25). This illustrates the remark made just before example 1. 

It can be shown that for the number of times of occurrence to 
be one of several numbers close to the most likely one is much 
more probable than for it to be certain other numbers. To show 
this, it will be useful to solve some problems of the following 
type: 

Suppose that the number of trials is 8 and that the probability 
of the event is equal to 1/4. Compare the probabilities of the two 
events: 

(a) The number of occurrences will be 1, 2, or 3 (all numbers 
close to the most likely number of occurrences); 

(b) the number of occurrences will be 5, 6, or 7. 

It will also be instructive to compare the probability of the event 
(a) with the probability of the event occurring some number of 
times other than 1, 2, or 3. 


The Problem of Repeated Trials 123 


38. LAPLACE’S APPROXIMATION FORMULA FOR 
CALCULATING THE PROBABILITIES OF THE 
POSSIBLE NUMBER OF TIMES OF 
OCCURRENCE OF AN EVENT 


In Section 36, we derived formula (8.1) 
n! 
Pen = Bam Pa * 
for calculating the probability of an event C occurring exactly & 
times inn» trials if the trials are independent and the probability C 
is equal to p in each trial. 

If n is large, it is not convenient to make calculations on the 
basis of this formula. Therefore, we shall give an approximation 
formula for calculating P, , that can be used if » exceeds 10 or 20. 

Stirling’s formula for approximating the values of factorials is 
familiar from analysis: 


mim VY In mme-™V m. (8.5) 


Calculations made on the basis of this formula yield a small rela- 
tive error even at Small values of m. As the value of m increases, 
the accuracy also increases. To illustrate this, we give a com- 
parison of the exact values of certain factorials with the approxi- 
mate values calculated from Stirling’s formula: 


m 4 7 10 20 
Exact mi 04 5040 3628800 2432.9. 1016 
Approximate (m!) 23,51 4980,6 3598700 2422,7- 10" 
—_— ? 
mn) 0,020 0,012 0,008 0,004 


If we substitute the value of the factorials 7!, &!, and(n —&)! 
obtained from Stirling’s formula into formula (1.8), we obtain, 
after some elementary manipulations, 


1 n np\¥( ng \"-* 
Puna V aan (%) (54) 
Let us replace the number & of occurrences in this formula with 


the deviation u of the number of occurrences from a number np 
that is close to the most likely number of occurrences: 


u —kh-— np. 


1 1 _ ( np rn" ( ng —_ 
Py. — Vin Vv (t+) 7 np +n ng —u 
na (1+ a5)(¥— ng} 


Then, 


n 


If we divide the numerator and denominator of the third and fourth 
factors by np and ng respectively, we may write 


1 1 i u “ee ny" 
tee ee yay 


124 Mathematical Analysis of Observations 


Up to now, we have made only identical transformations when 
using Stirling’s formula. Let us now suppose that the number of 
trials » is large and that the number of occurrences ke differs only 
slightly from ap. Furthermore, let us assume that p is not close 
either to zero or to unity. It follows from these assumptions that 
the number uw is small in comparison with zp and ag. On the basis 
of these assumptions, we find the expression for P,,. Since 
uf/(np) and u/(nq) are, by hypothesis, small fractions, the third 
factor can be replaced with unity. Note that we are neglecting the 
number 

P—J_ _4 


——SSwa ee ee 


2 Vn7q 


and the second and higher powers of the fractions u/(np) and 
u/(nq). This is easy to show ifwewrite the binomial expansions of 


To simplify the fourth factor, we use the identity 


, u —np—-wu 
In (1 +4) ——(np--u)In (1 +2). 


Expanding the logarithm on the right in powers of u/(ng¢), we 
obtain 


-(np+u) 
u u u a3 
in (1 =| =— ap( =\(S ss — 
fu a? ue “3 
= —_— —_— ——— oe td — ° 
np (+ n2p? Oneps | ..)= au np? 
hence, 
—(Np+u) ye 
(1 +=) =e npt 


In the same way, we transform the last factor: 


in(1 ey eng(1 4) (2 4 oy de 


ng } ngitng ° 2nig? 


u u2 
= Ngd\| —_ — _._- ee — 
(x ong? + 4 


Thus, 


9 
tu —(ng-n) no _ 


The Problem of Repeated Trials 125 


If, in the fourth and fifth factors, we keep only the second powers of 
a, we obtain the approximating formula for Prat 


Ww 


Phong = eee PT, (8.6) 


If we now replace uw with its value v= —rnp, we have Laplace’s 
formula in closed form: 


_(k-npyp 


e NPT , (8.7) 


| l 
P = - > 
enV npg Vor 
To simplify the writing of Laplace’s formula, we introduce, in 
place of the number of trials &, the random variable z, defined by 


— -—"p 
Vnpq © 


To two values of & that are different from unity there correspond, 
according to this formula, the following values of z: 


2 


k—np k+-l—np ] 
z= —— and z4 Az = ——_—*; hence Az= . 
Vnpq Vapq | V npq 


If we now substitute z and Az into formula (8.7), we reduce it to the 
following simple form: 


» = 
Pan = Tee 2 Az, (8.8) 
where 

k—np ] 
2= ——- , Az =-———-. 8.9 

V npq V npq (8-9) 

The function 
1 -t 
Von 


is encountered in many probability-theory problems. It is called 
the Laplace-Gauss function. A table of values of this function 
from z = 0.00 to z = 3.00 with a step of 0.01 is given in the back of 
the present book. 

This table can easily be used in conjunction with formulas (8.8) 
and (8.9) to calculate approximate values for the probability that an 
event will occur a certain number of times. 

Formula (8.7) gives rather exact values of P, , even for small 
values of n if p is close to 0.5. As an example, we use the table of 


126 Mathematical Analysis of Observations 


values of P, , for »= 20, p= 0.4,andq= 0.6. We confine ourselves 
to three decimal places:* 
A 2 1 6 3 WW IZ 14 16 
Exact values of Pyon+ + 0,003 0,035 0.124 0,180 0.117 0.036 0.005 0.000 
Approximate values ..... 0.004 0.035 0.120 0,182 0118 0.035 0,004 0.000 

It is clear from this table that the error does not exceed 0.004. 
The relative error is less for values of & that are close to np= 8, 
that is, that are close to the most likely number of occurrences. 
For values of & that are appreciably different from the most likely 
number, the approximating formula gives less satisfactory values 
for the probability. The same is true of all simplifications that 
were made in deriving the formula. 

Formula (8.7) makes it easy to investigate the probability 
distribution. According to this formula, P,, has a maximum at 
k=np. This approximate value z,, of the most likely number of 
occurrences can differ from the actual value by an amount not 
exceeding the correct fraction (p or 4g), since the most likely 
number of occurrences lies between the numbers np — q and np +p 
or is equal to each of these numbers in the case of two most likely 
numbers of times of occurrence. 

Substituting k= np in (8.7) we get the following formula for the 
approximate value of the probability of the most likely number of 
occurrences: 


_} tT 
V2 Vnpq 
If n= 9, p= 0.6, and g= 0.4, we obtain from this formula k,, = 5.4 


(instead of exactly 5 or 6) and P,, , = 0.253 (instead of 0.25 in ex- 
ample 2 on page 122). 


Pin, no~7 


39. AN APPROXIMATING CURVE FOR THE PROBABILITY 
DISTRIBUTION 


From formula (8.7), it is also easy to draw an approximate 
graph of the probability distribution for the various numbers of 
times that an event can occur. Let uslay off the values of y= nP, , 
along the ordinate andthenumbers «= k/n—p along the abscissa; 
that is, we place the coordinate origin in a position corresponding 
to the approximation of the most likely number of times of occur- 
rence. Then, the equation of the curve approximately representing 
the probability distribution function can be written in the form 


(8.10) 


“The use of the term ‘‘exact’’ means that the figures are correct to three decimal 
places not that these are the absolutely exact values of /’;, ,,. 


The Problem of Repeated Trials 127 


This curve is symmetric about the ordinate and has a maximum on 
it. It has two inflection points at x — + Vv Ft. The x«-axis is an 


asymptote of this curve. The ordinates of the curve are easily 
a 


l 


calculated if we use Table 1 forthe function —-e %, In the present 
Tt 


case, 


xX 
y # 
fn 


The curve representing the case in which» = 5, p= 0.5, and g= 0.5 
is shown in Figure 4. The figure shows that even when n — 5, the 
approximating curve represents the exact graph of the distribution 
satisfactorily. 


i= 


Fig, 4, Comparison of the values of the 

probabilides of P, ,, calculated from the 

exact formula (indicated by the crosses) 

for n= 5S and p= q=.0.5 and from 

Laplace’s approximating formula (dots), 

The dashed curve was constructed from 
formula (8,7), 


40. POISSON'S DISTRIBUTION (THE LAW OF RARE EVENTS) 


The derivation of the approximating formula for P,,, that was 
given in Section 38 is not satisfactory if »p is very small (or very 
close to unity). To obtain an approximate expression for P, ,, in 
the case of small values of p, we rewrite the basic formula in 


the form 


Py ye Wt n=) naa (2) ( ~2y" 


128 Mathematical Analysis of Observations 


where a=np. The denominators of the first two factors are 
switched and the denominator n* is written asn¢en'n...n. The 
first factor is then further factored so that we have 


nn n—l n—k+1 ak a\" a\~* 
Pry Ah REBEL HY Slay 
If & and a are fixed, p will approach 0 as n approaches oo, There- 
fore, 


If & and a are small in comparison with vn and if p is small in com- 
parison with unity, all the factors in the last equation for P,, , ex- 
cept a‘ / kt can be replaced with their limiting values. Then, 


ko-a 
Pana. (8.11) 


Obviously, an analogous distribution can be constructed for the case 
in which p is close to unity. 

Poisson’s distribution (8.11) is used in problems dealing with 
rare events (the emission of beta rays, weak solutions, etc.). A 
table of the Poisson distribution function is given in the back of the 
book for values of the arguments a and &. 


Chapter 9 
DISCRETE RANDOM VARIABLES 


41. RANDOM VARIABLES 


In the study of natural occurrences, we often encounter quantities 
whose numerical values we cannot state in advance (that is, before 
observation), even though we may know certain conditions under 
which these occurrences will take place. 

For example, we know that every measurement is accom- 
panied by some errors, including random errors. Or, for ex- 
ample, we do not know the characteristics (color index, parallax, 
etc.) of a star taken at random from a Catalog until they are 
measured.* 

Consider the set of small planets, the elements of the orbits 
having been determined for each of them. When we study the 
general properties of such a set, we consider these elements 
as random for an arbitrarily chosen asteroid. This assertion 
should be understood as follows: In every set of asteroids, each 
of the elements, for example, the mean distance, has quite dif- 
ferent values. If we choose an arbitrary (undefined) planet, we 
cannot say in advance what its mean distance is. Quantities of 
this sort are said to be random. Quantities whose values can be 
stated are customarily called defined quantities. 

We also speak of random variables even in those cases when 
the phenomenon is thoroughly studied and the law of its behavior 
is known. From Ohm’s law, we can predict the current if we 
know the voltage and resistance, but measurements may show a 
slight deviation from the predicted value. Such a deviation is an 
error in measurement (some part of which is brought about by 
random causes and cannot be predicted in advance). The question 
of random errors in measurements is therefore studied in prob- 
ability theory. 

Random variables can be of various types. Let us consider 
two types. We shall call a random variable discrete if it can take 
only a finite or countable set of values. 


*To understand the last example properly, we should take into account the fact that the 
physical characteristics of the star itself are not random but are determined by its 
origin and evolution, They can be treated as random if we choose the star in a random 
manner, 


K 129 


130 Mathematical Analysis of Observations 


In the problem of repeated trials, we met a classical example of a discrete random 
variable, one having only integral values, For example, if a hundred trials are made, 
the number of occurrences of the event is a random variable assuming values from 0 
to 100. The probabilities of these values can be calculated from an exact or approxi- 
mate formula, 

Suppose that lottery tickets are issued at five dollars each and that some of the 
winning tickets pay one hundred dollars, some pay fifty dollars, and some pay ten 
dollars, The other tickets pay nothing, The profit made by the owner of such a lottery 
ticket is a random variable whose values are 95, 45, 5, or -5 dollars, These numbers 
are obtained by subtracting the original cost of the ticket from what the ticket pays, 
The fourth value represents a ticket that does not pay off, It must be included in the set 
of values for this set to be complete, 

Another example of a discrete random variable (again with integral values) is the 
number of stars in multiple systems of stars, If we consider the set of all multiple 
systems, the number of stars in a system can be 2, 3, 4,.., and will be random with 
respect to the choice of an arbitrary member of the set, 


A continuous random variable can assume any value in some 
definite interval of values. The only random variables of this type 
that we shall examine are those for which it is possible to deter- 
mine the probability of taking a particular value in some given 
interval. This interval, containing all possible values of a con- 
tinuous random variable, is called its vange. Some simple ex- 
amples of the continuous type of random variable are the absolute 
magnitudes of stars of a particular spectral class, elements of 
the orbits of small planets, errors in measurements, etc. In the 
first two examples, the element of chance appears in the study of 
the entire set of objects when an arbitrarily and randomly chosen 
object is examined. 


In this chapter, we shall consider only discrete random variables with a finite number 
of possible values, We shall also assume that these values of the random variable are 
the only ones possible and that they are mutually exclusive, that is, that the entire set of 
values is known and that the occurrence of one of them excludes the occurrence of all 
other values, 


From the point of view of probability theory, the question of 
the values of a discrete variable is defined if the probability of the 
individual numerical values of the random variable X are given. 

Let us denote the probabilities of the consecutive values 
X19 Xqy «0 659 Xn DY M45 Poy «sy» py, and let us write the data in the 
form of a so-called distribution table:* 


numerical values of X}| «,, «4, ... 5 Xp 

probabilities Pis Pore 0 0 9 Pn 
With these notations, X (without subscript) essentially denotes a 
function, whereas x denotes its arbitrary value. Since we are 
assuming that all possible values +x, are listed, from the theorem 


on addition of probabilities, we obtain 


PitPpat «++ + py=l. 


*We have already encountered examples of such tables in the preceding chapter, 


Discrete Random Variables 131 


A defined quantity, of course, has one value and the probability 
of that value is equal to unity. 

Definition. Two random variables that come up in a single 
problem are said to be mutually independent if the probability of 
each value of one of them is independent of the value assumed by 
the other. 

It is only in the simplest problems that this definition can be 
applied with sufficient justification. For example, if two players 
throw a pair of dice each, the number of points that will appear on 
the dice will be random and independent. It is much more difficult 
to establish independence in physical problems. If we examine 
such characteristics as the absolute magnitudes and radial veloci-~ 
ties of the stars belonging to a particular set of stars, there is no 
physical reason for believing that there is a (probability) con- 
nection between these values. However, we cannot be completely 
sure that no such connection exists. If quantities like these are 
considered together, we must use caution with regard to the as- 
sumption of their independence. We may also note that in a number 
of problems the assumption of independence is a hypothesis that 
must be checked by observation. 

If we denote the values of a random variable X by x, (for 
r= 1, 2,..., m) and the values of y by y, (for s= 1, 2,...,2), 
the condition of independence of X and Y can be written as follows: 


P (x,!¥5)=P(x,), P(¥5|*-) =P Cs) (9.1) 


for all possible values of r and s. 


42. THE EXPECTATION OF A DISCRETE RANDOM VARIABLE 


Definition. The expectation of a discrete random variable ¥ is the 
sum of the product of its numerical values and their respective 


probabilities. We denote the expectation by F(X) or x. (Other 
notations are M,and M(X) .) By definition, 


E(X) = 2 Pere (9.2) 


To clarify this concept of expectation, let us consider the following problem: 

Suppose that one hundred lottery uckets are issued and that they are sold for five 
dollars each, Suppose that eighteen of them are winning tickets, two paying fifty dollars, 
six paying twenty-five dollars, and ten paying five dollars, The price of the ticket is not 
refunded, Determine the expectation of the profit for an owner of a single ticket, 

By assumption, this profit is a random variable, Therefore, we need to compile a 
table of all the possible values of the profit and the probabilities of these values: 


xX 45 20 0 —5 
p 0,02 0.06 0.10 0,82 
(The last column represents the case in which the ticket does not pay off,) By definition, 
the expectation (in dollars) will be 
E(X) = 45°0.02 + 20°0,06 + 0-0.10 - 5°0.82 =- 2. 


132 Mathematical Analysis of Observations 


If we calculate the overall profit for all the tickets, we obtain $300 — $500 = -$200 since 
the total amount of winning payments is three hundred dollars and the cost of all the 
tickets is five hundred dollars, If we divide the total profit by the number of tickets, we 
obtain the average profit for a single ticket, This average profit is exactly equal to the 
expectation, Both numbers are obtained by the same operations though performed in a 
somewhat different order, 


Thus, the expectation corresponds to the arithmetic mean. 
These numbers are equal when we know precisely all the values of 
X and their probabilities. Therefore, the expectation is sometimes 
called the theoretical average value of a random variable. 


Properties of the Expectation. 


I. If the values of a random variable have dimensions, its ex- 
pectation will have the same dimensions. This follows immediately 
from the definition, since the values of x are multiplied by pure 
numbers (the values of the probabilities). 

Il. The expectation is a positive number if all the values of X 
are positive; it can be either positive or negative if the values of 
X include negative numbers. 

Ill, The expectation of a defined quantity is equalto its numeri- 
cal value. 

Since, by hypotheses, X can assume only one defined value c, 
the probability of this value is equal to 1: 


E(X)=c-l=ce. 


IV. The expectation of the product of a random variable X and 
a fixed number c is equal to the product of this fixed number and 
the expectation of the random variable. 

Since a random variable is not an ordinary defined number, 
we must first come to an agreement as to what the product of a 
random variable and a fixed number means. The usual definition 
of such a product is as follows: the product of a random variable 
and a fixed number is the new random variable whose range con- 
sists of the products formed by multiplying the values in the range 
of the original random variable by the fixed number; the probabili- 
ties of these new range values are equal to the probabilities of 
the corresponding values of the original random variable. From 
this definition, if 

NW Xi, NXg ve ee Nyy 
Py» Par sees Dns 


the distribution table for the random variable cX will be 


cx | 


a ar os 


CX, CXo, oe @» CX 


Therefore, 


n n 
E(x) = 2 CXEDE = c Xy, Py = CE (X). 


Discrete Random Variables 133 
V. If m and M are the smallest and largest of the values of X, 
m<OE(X)< M., 


To show this, write 


E(X) = XPit XPot ... + xXapy 


Let us replace all the numbers x, on the right side of this equation 
with m. Since the numbers p,, p., .-. +s Pn being probabilities, are 
all positive, the right side of the equation cannot increase. There- 
fore, 


E(X)>mp,tmpet ... + mp, = 
=m (py py+ 6. + pP,)=m-l=m, 


In the Same way, we can show that 
E(X)< M. 


We note that this conditional inequality becomes an equality only if 
X ig a constant. 


43. THEOREMS ON THE ADDITION AND MULTIPLICATION 
OF EXPECTATIONS 


1. The Addition Theorem. 


The expectation of the sum of two or more random variables is 
equal to the sum of their expectations, 
Suppose that we have two random variables: 


X| x, Xo, . 005 Kye |e Yor over Vp 
Pts Pas vers Dn’ Py Par eens Pry. 


In the general case in which .¥ takes on one of its » values, Y can 
take any of its m values. Therefore, the sum X-+ Y takes on the 
mn values that can be obtained by adding a value of X to a value 
of Y. The distribution table for *+ Y is of the form 


XV Le Ye vee BDI Seb Ye ve Ee ee eee nt Sim 


Pus tees Dimi Pate eee Pets vee Pam 


where the p, (fork=1,...,nand/=1,..., m) are the known 
values of x,+ y,, that is, the probabilities of the combination of x, 
with y,. The sum X-+ Y takes the value x,-+ y, if both the following 
two events occur: X takes the value x, and Y takes the value y,. 
Therefore, p, is the probability of the composite event composed 
of two events, namely, the occurrence of x, and the occurrence of 
y,- Since the conditions of the theorem do not stipulate that the 


134 Mathematical Analysis of Observations 


probability of «x, must be independent of the occurrence or non-~ 
occurrence of y, (that is, since X and Y may be dependent), we 
must use the theorem for the multiplication of probabilities in its 
general form, Therefore, 


Prt = PuP (Ji|Xx) = PrP (Xn] 2). (9.3) 


where, as before, P(x,|y,) is the probability that xX will take the 
value x, if it is given that Y takes the value y,. The number of 
equations of the form (9.3) is mn, Here, k& assumes all integral 
values from 1 to n and / assumes all integral values from 1 to m, 
From the definition of expectation, 


E(X+Y) =2 a (<4 > V1) Prt: 


Let us perform the multiplication indicated in the term to be 
summed. Then, let us replace p,, with the middle expression in 
equation (9.3) when it is a coefficient of x, and let us replace it 
with the expression on the right of equation (9.3) when it is a 
coefficient of y,. If we now break the summation into two parts 
by separating the x-terms and the y-terms and reverse the order 
of summation in the first part, we obtain 


E(X+Y)= Xs PiXe aP (91]xx) 1+ 3 Pix >> P (xXx19) 


It is easy to show that each of the sums in the braces is equal to 
unity. Specifically, from the theorem on the addition, expressed 
for conditional probabilities, 


QP (Milxe) =P (yt Yet ++ + Valen) 


(The addition on the right side is symbolic, meaning the occurrence 
Of y,, OF yoy... OF Ym.) Since the list of values is complete, it is 
certain that one of them will occur no matter what the value of x, 
is. Therefore, 


>» (Mi|*,) = I 

for all values of x,. In an analogous manner, we obtain 
> P (Xp ¥1) = | 

for all y,. Therefore, 


E(X+ N= 2 Pua t 2a Piyy= E(X) +E (y). (9.4) 


Discrete Random Variables 135 


It should be noted that no restrictions have been imposed on 
the random variables X and Y in the proof of this theorem. 


2. The Multiplication Theorem 


The expectation of a product of independent random variables is 
equal to the product of their expectations, 
Suppose that we have the random variables 
Xi) Xy, Xo, 200, Ly Y 


Py Pav vers Pri 


Vi» Vor sees Ve 
Py. Poy wees Page 


The product of the two random variables is the random variable 
whose values are obtained by multiplying each value of one of the 
random variables by every value of the other. The distribution 
table for the product variable is as follows: 


NY VX Ve XM eee KAM Myon cece My Vor sees XV o +00 XnVm 


Ub Pure Por sees Pras Page veer Png ces Da veer Pam 
Since the original random variables are independent, the proba- 
bility of the combination of values x; and y, (for k= 1, 2,.'...,2 
and [= 1, 2,..., m) is equal to the product oftheir probabilities: 
Prr= PrP l 


Therefore, 


nr ™m 
E (XY) = 2 2 PrP X41. 


If we take p,x, from under the inner summation sign, we obtain 
n m 
E(xY) =2 PRX ($ Pin). 


The sum in the parentheses is the expectation of Y. That is, itisa 
definite number that can be taken from under the first summation 
sign. Then, the first sum is the expectation of X. Thus, 


E (XY) = E(X)F(Y). (9.5) 


This theorem is easily generalized to the case of an arbitrary 
number of mutually independent random variables: 


E(XYZ ...)=E(X)E(Y)E(Z) ... (9.6) 


It should be especially noted that the multiplication theorem is 
valid only for independent random variables. If the random 


136 Mathematical Analysis of Observations 


variables are dependent, the content of the theoremis considerably 
changed since we must then know the probability of x, given y, for 
every combination of values of & and J. 


44, THE VARIANCE OF A RANDOM VARIABLE AND 
ITS PROPERTIES 


Consider a random variable X, whose expectation we shall denote 
by x. 
Definition. The variance of a random variable is the expecta- 
tion of the square of the deviation ofits values from its expectation. 

Suppose that our random variable X has the values x,,*,,..., 


*,, and that the probabilities of these values are p,, p.,..-+s Dns 
respectively. Let us form the new random variable X— x, whose 
values will then be x,— +x, »,—x,...,x,—.«. The probabilities 


of these values are the same as the probabilities of the values 
X15 Xp» +» X, Since x is a fixed number. The values x,—x,..., 
X,—x are called the deviations of the values of the random 
variable from its expectation. The new random variable xX —-¥x is 
the deviation of the random variable from its expectation. Let us 
square the values x«,—x,...,%*,—%*. The results will be the 
values of another random variable, namely, the square of the 
deviation of the random variable from its expectation. The 
probabilities of these values of this variable are the same as the 
probabilities of the values of x. Let us make a table of the values 
of the square of the deviation as we have done for other random 
variables: 


(4 — 2), (9) (2 


Dy, Po eee Dns 


(X — x)? 


The variance of a random variable X is denoted by var X or s 
By definition, 


3 
2° 


varX =o", = E{(X— x)?} (9.7) 

or 
varX = >\p, (x,— x). 
k=1 

In some texts the variation is called the dispersion and is denoted 
by D. 
Properties of the Variance 
I, The variance of a random. variable is a nonnegative number: 
it vanishes only if the random variable has only one value. This is 


true because of the numbers 


(x; — x), k—1, 2,..., 7 


Discrete Random Variables 137 


are nonnegative and hence, var V is a nonnegative number because 
of property II of the expectation. The sum 


n 


Ds Px (Xe — xP 
kel 
can vanish only if each of the terms is 0, that is, only if 
XN, Xo ==... — xX, =X, 


and in this case x is a constant whose value is x. 

II. The variance of a constant ts equal to 0. This is true because 
the expectation in this case is equal tothat constant and its deviation 
is equal to 0 with probability 1. This means that the variance is 
equal to 0. 

Ill. The variance of a random variable is equal to the difference 
between the expectation of the square ofthat random variableand the 
square of its expectations, To see this, we apply properties IV and 
III to the expectation of (X — x)? to obtain 


varX =F (X— x}? =F {X?— xX + x2} SE (X*) — 2KE(X) + x2, 
Since E(X)=¥<«, 
var X =F (X?)— 2x24 x? = E(X2) — x? 
or 
var X = £(X*)— [E(X)}. (9.8) 
IV. The variance of the product of a random variable X anda 
constant c is equal to the product of the square of the constant and 


the variance of the random variable, 
Proof: Suppose that 


E(X)=x, % = E|(X—x PI. 
From property IV of the expectation, 
E(cX) =cE (X)=cx, 
Therefore, 
varceX =E '(cX — cx} =E [02 (X— xP} = CE ((X — x) 
or 


o2, == Cet, (9.9) 


cX 


V. The variance of a sum of mutually independent random vari- 
ables is equal to the sum of their variances, 


138 Mathematical Analysis of Observations 


We shall prove this only for two terms. The proof is the same 
for a larger number. 

Let us denote by x and y the expectations of the random vari- 
ables X and Y. Since 


E(X+Y)=x+y 
according to the theorem on the addition of expectations, it follows 
from (9.8) that 
var(X-+ Y)=E {(X+ YY}— w+yy 


If we expand the terms (X-+ Y)? and (x-+ y)? and use the theorem on 
the addition of expectations and properties IV and III of expectations, 
we obtain 


var (X + Y) =E(X?)-+ 2E (XY) +E (¥) — x? — 2xy — y*. 
Since X and Y are assumed independent, 
E(XY)=E(X)E(Y)=xy. 


The second and fifth terms of the right side of the preceding equa- 

tion cancel each other out. If we use equation (9.4) of the pre- 

ceding section, we obtain our formula for the addition of variances: 
var (X-+ Y)= var X -+-var Y 


or 
Pr y=. (9.10) 
It is easy to show in the same way that 


var(X— Y)= var X + var Y =ofy_ »=s) +0? (9.11) 


VI. if 


U=aX+bY+cZ4+ ... +r, 


where X, Y, and Z are mutually independent random variables and 
a, b,c,-..., rv are constants, the variance U is determined by the 
formula 


OF, == ao? + b%a® 4+ c¥o? + -.. +0. 


From the theorem on addition of variances (generalized for the 
algebraic sum of several terms), 


Oy toy toe t ... +02, 


2 __ 7? 2 __ p29. 29 
Oy =a*s,, Spy == bra! fe} 


—— p2nde 
nr 


Discrete Random Variables 139 
so that, according to property II, 


9 
c= 0. 


Comparing the last two equations, we obtain 


var'U =a? var X +62 varY¥Y +c var Zz +...+0, (9.12) 


45. EXPECTATION AND VARIANCE OF THE 
NUMBER OF OCCURRENCES 


Suppose that we perform n independent trials with respect to an 
event C and that in each of thesetrials the probability of C is equal 
to p. The number of occurrences & of the event C is a discrete 
random variable, which can assume the values 0, 1, 2, 3,..., 2. 
As was shown in Sections 36-37, the distribution table of this 
random variable is of the form 


R U ! 2 tae m n—-) fn, 


~?9 n(n—1)... (n--m--1) aH 
ae ep 
mi 


~1 m(n=1 _ _ 
Pron @ mpg” 2m) 1% gm. npt—lg pi, 


Let us derive formulas for the expectation and the variance of 


the number of occurrences of the event. Fromthe distribution table, 
we get 


E(k) =0- 9g" + 1+ apg! + 


n(n—l) 


fg TM peg @ to + 


n(n — 1)... (n- 


m+!) _ 
Un — Hong” m + a = 


+ n(n—-— lp o'g + np", 


or 


E(k) = mp[gr-! + (n— I) pgr-? +... 
4 fA em pmnlgn-m to 


.ee $(n — 1) pr—2q + pr], 


The expression in the square brackets is the binomial expansion of 
(g+p)""*, Butg+p=1. Therefore, 


E(k) = np. 


Note that £(£)is equal to the most likely number of occurrences or 
differs from it by an amount less than unity. 

Let us turn now to the matter of determining the variance of the 
number ofoccurrences. By definition, the variance is the expectation 
of the square ofthe deviation of a random variable from its expecta- 
tion. From the results of the preceding problem, F(k)=Hnp. 


140 Mathematical Analysis of Observations 


Therefore, if we denote the variance of the number of occurrences 
by o2 and apply property III of variances and equation (9.8), we 


k 


obtain 

Oy, ——W Bi (Rk?) —_ n? p2 
To calculate £(k2), we use a device appearing in the book by V. I. 
Romanovskii. 


It follows from the identity 


Re=k(k—1) +k 
that 
E(k?) = E {k(k — 1)} +E (2). 


Since & is a random variable assuming the value 0, 1, 2,..., 7, the 
quantity k(k — 1) is also a random variable, whose values are 


0, 0, 2, 3-2, 4-3,... 


The probabilities of these values are the same as the probabilities 
of the corresponding values of k. Therefore, 


E{k(k—1)}=0- g? +0. apqr- 1g AAD pagn- —2 


43. gna NO?) 3 pins 4.3 2a Dn Fyn 3) jagn-t 
m(m—1)n(n—1)...(n—m+1) WAN — mM 
+ T33 me Da epm  Oe 


» n(n — 1) (a — 2)p"-19¢ +n(n —1)p? = 
= n(n — 1) p? gn? +-(n — 2)+ pqr-3 +4—)¢ =o) pga i+... 


(n — 2)(n— 3).. -(2—m+1) 


in 15 (ny p- “qn on 


» p(t — 2) p™—3q + +o 
It is easy to see that the expressionin the square brackets is the 


binomial expansion of (¢-+-p)"-?, which is equal to unity. Thus, 


E{k(k— 1)} =n(n—1)p?, 
so that 
E(k?) = n(n — 1) p?-+np. 


Therefore, 
o% = np? — np? +-np — n’p? = np (1 — p) 
or, finally, 


a? = npq. 


Discrete Random Variables 141 


Note that both the expectation and the variance of the number 
of occurrences appear essentially in the approximate formula of 
Laplace (8.7) for the problem of repeated trials. If we replace 
V npq with o, and np with &, this formula becomes 


(kk)? 


Py l } e 207 ; (9.13) 


ny +l 
% Yen 


Chapter 10 
THE LAW OF LARGE NUMBERS 


The law of large numbers is the name applied to the set of 
theorems on the probabilities of events in the case of a large 
number of trials or on the probabilities of the values of sums of 
random variables when the number of variables added is great. 


46. THE CHEBYSHEV-MARKOV LEMMA 


LEMMA. Ifa random variable Xassumes only nonnegative values 
and its expectation is equal to x, the probability that X will assume 
a value less than tx (where t® is an arbitrary positive number) is 
greater than 1 - 1/t*; that is, 


_ 1 
P(X <x) >l—-x. 


Let us denote by M the largest value of the random variable X. 
If 2 is so great that («> M, the conclusion of the lemma is ob- 
viously valid because in that case 
P(X < Px) 1. 


The case in which 0 < #2 < 1 is also of no interest because then 
1 
1-4 <0 
and we obtain the obvious result that the probability is greater than 
a negative number. 
Thus, we may assume that 
x<Pxc M. 


It follows from this inequality and from property V of the expecta- 
tion (see Section 42) that among the values of X there is at least 


142 


The Law of Large Numbers 143 


one greater and one less than tex, Let us consider those values of 
X that are at least as great as /2x and let us denote them by ~,, x, 
-,» X, Then, we can write the distribution table in the form 


x 


Ny, Xor seer Np NXepye sees Nyy 


Pys Pov sees P| Phos cers Pn- 


By definition, 
E(X) = x =x yp, + Xep2t -.. HX epeE Xe Pear tee) BXaPn- 


Let us discard all the terms on the right whose subscripts are 
equal to or greater than £-+-1. Since all the x, are assumed to be 
nonnegative and the p, are positive (for s— 1, 2,..., 2), the 
right side can only decrease asaresultof this discarding of terms. 
Therefore, 


x > xp + xopet... + xKPe 
Let us replace all the x, (for r—1, 2,..., 2) on the right side of 
this inequality by ¢2x, which is either less than all the numbers for 
which it is substituted or equal to one of them. Since all the p, are 


positive, the right side decreases. 
Therefore, 


x >@x(pptpt...tpy 
or, since ?> 1, 
Prtpet... +p, < >. 
From the theorem on addition of probabilities, the left side of this 
inequality is the probability that X will take one of the values «x,, 


Xyy s+» Xp, that is, that the value of X will be at least equal to /?x. 
If we denote by P(X >?f*x), the probability that X > f?x, we have 


P(X> Px) <a. 


The quantity X has a value either greater than /¢2x or equal to it or 
less than it. Therefore, 


P(X < fx) +P(X> Px) =1. 


If we compare this last equation with the preceding inequality, we 
obtain 


P(X <Px)>1— x (10.1) 


which completes the proof. 


144 Mathematical Analysis of Observations 


Thus we see that if fis a very large number in the inequality 
P(X> Px) <4 


values of X much greater than its expectation are extremely un- 
likely. 


47. THEOREM OF J. BERNOULLI 


Definition. The ratio of the number of occurrences s&to the number 
of trials n in the problem of repeated trials is called the velative 
frequency, If we perform experiments, this quantity is the ratio of 
the number of times that the event has occurred to the total number 
of trials. 

Bernoulli showed that when a large number of trials are made, 
the relative frequency must, inthe majority of cases, be close to the 
probability. This is the gist of Bernoulli’s theorem, We give two 
formulations of the theorem. 

Formulation 1. Suppose that aninfinite number of trials are made 
that are independent with respect to aneventC and suppose that the 
probability of the event p is constant forall ofthem, Then, we may 
expect with probability arbitrarily close to unity that when a suf- 
ficiently large number n of trials are made, the differences between 
the relative frequency k/nand the probability p will be arbitrarily 
small in absolute value, 

Formulation Il. With the same assumptions as in the first 
formulation, for any two positive numbers < and 4, no matter how 
small, there exists a number N such that when the number of trials 
exceeds N, the probability that k/n—p <e will be greater than 
I —6; that is, 


P[|z—p|<e]>1-2 for n> N (2, d)=F4. (10.2) 


be 


The second formulation differs from the first in the following 
ways: the deviation of the relative frequency from the probability 
is bounded by the number e, the number 6 gives an indication of the 
closeness of the probability to 1, and the number N, defined in 
terms of e« and 8, represents a number of trials that is large 
enough for the stated inequality to be true. 

Proof: The random variable (& — np)? can have only nonnegative 
values. 

The expectation of this variable is known. It is the variance of 
the number of occurrences, which is equal to npq(see Section 45). 
If we apply Chebyshev’s lemma to the random variable (k — np)*, 
we obtain 


] 
P((k — npyY < fnpq| > 1— =. 


The Law of Large Numbers 


145 


The probability that this inequality will be satisfied is unchanged if 
we replace it with an equivalent inequality. The inequality in the 


square brackets is equivalent to the inequality 


pal 


in which ¢ is assumed to be positive. Therefore, 


P(|$—a|<tV E]> 1-4 


Let us determine ¢ in such a way that 


ry AL =! 
n 


From this condition, we obtain 


Then, 


P\i=—pl<el> j— 74 , 


ne 


(10.3) 


The derivation of this inequality essentially proves the theorem 
in its first formulation since for given values of p and qg and for an 
arbitrarily small positive number e, there always exists a suf- 
ficiently large number JN such that the number on the right side of 
the inequality can be made arbitrarily close to lif n>N. To 
prove the theorem in its second formulation, let us assume e and 3 


given and let us define N by 


If n> N 


and, consequently, 


(10.4) 


We shall not violate the inequality (10.3) if we replace its first 
term with 1 —%8. This proves the theorem in its second formula- 


tion. 
L 


146 Mathematical Analysis of Observations 


It should be noted that in the definition of N and in the proof of 
Bernoulli’s theorem, we used the lemma just proven (after the 
double transformation of an exact equality into an inequality). There- 
fore, the value of N that we obtain in this manner is sufficient but 
it should not be considered necessary, A more exact evaluation 
shows that the inequalities of Bernoulli’s theorem are valid even 
with the number of trials N. 

It is possible to find a sufficiently large number of trials to 
ensure the validity of the inequality in Bernoulli’s theorem even if 
we do not know the value of the probability. Consider the quantity 
pq =v, Which is the product of two numbers whose sum is unity. 
Let us find the maximum value of the function 


v=p(l —p). 


If we set the first derivative equal to 0, we see that v has an 
extremum at p= 1/2 (and g= 1/2) (v takes the value 1/4); the sign 
of the second derivative shows that it is a maximum. Therefore, 
no matter what the values of p and g, we always have 


1 
V=pqIRz: 
Therefore, on the basis of equation (10.4), a sufficiently large 
number of trials for all values of pis 


; 


This remark is significant for the converse of Bernoulli’s theorem, 
according to which we may take the observed frequency of occur- 
rences as the approximate value of the unknown probability. This 
is not the exact content of the converse of Bernoulli’s theorem, 
which is formulated analogously to the direct theorem given above. 

Let us consider some examples of the application of Bernoulli’s 
theorem. 


Example 1, A coin is tossed, Find a sufficient number of trials for us to be able to 
expect with a probability greater than 0.9 that the relative frequency will differ from the 
probability of 1/2 by.an amount less than 1/5 in absolute value, 

In the present case, p=qG=1/2, «= 0,2,8= 1—0,9 = 0.L From formula (10,4), 
we obtain 


Thus, we can assert that if n> 63, the probability that ¢| — 1|2<0,2 will be greater 
than 0.9: 


ko 4 
p(|=—s| < 0.2) > 09. 


We note that this inequality allows rather wide bounds for the number of occurrences, 
for the inequality k| 7— 1|2 < 0,2 is equivalent to the two inequalities 


—O02< & —05 < 0.2 


The Law of Large Numbers 147 
or 


0.3n << k < 0.7n. 


The same number of trials is obtained if we use the ‘‘cruder’’ formula 


This is explained by the fact that in the present case v = pq = 1/4, We set n= 100, 
Then, 


30 < k < 70. 


To find narrower bounds for k, we would need to decrease ce, which would considerably 
increase NV as can be seen from formula (10,4) since we have e?in the denominator, 
Example 2, A coin is tossed two hundred times, Estimate the probability that 


Rk - 
ar 05 < 0.1. 
Here p=9= 0,5 and e = 0,L We take two hundred for the value of N, 


A lower bound on the probability of the inequality given will be known if we determine 
3. From formula (10.4), 


lod 
> 22 081 
~ 900-0.12 8° 


Thus, when we have tossed the coin two hundred times, we may expect, with a probability 
exceeding I — 1/8 = 0.875, that 


k - 
| sag 95] < ot 
or 


80 < kk < 120. 


Example 3, A coin is tossed nine hundred times, What are bounds within which the 
number of heads will be expected to lie with a probability exceeding 0,997 

Here, p= 1/2, g= 1/2, 1— 3= 0,99, and 3= 0,01. For N we take 900, We 
determine e from formula (10,4); 


o_ PI _ 1 1 
ND 36 
Consequently, 
R - 
900 OP | SE 
or 
300 < k < 600. 


Thus, we may expect with probability exceeding 0,99 that the number of times that the 
coin falls heads will He between 300 and 600 if the coin is tossed 900 times, 

Example 4, A die is thrown 1,200 times, Estimate the probability that the number of 
sixes will lie between 150 and 250, 

In this example, as in the preceding examples, the bounds on the number of oc- 
currences of the event are symmetric about the expectation of the number of occurrences, 
which in the present case is equal to 1,200- (1/6) = 200, This value of the bounds Is 
explained by the form in which Bernoulli’s theorem is proven: 


r(|B-o|<s)>1-8 


148 Mathematical Analysis of Observations 
From these bounds, we determine « first: 


k ] 
—'§< 7999 ~F< re 


200 — 1200e < k < 200+ 1200 «, 
200 — 1200 = 150; es = sr 
Let us check the right side: 


200 + 1200e = 200 + 1200- x = 250, 


We set N= 1,200, In order to obtain a figure for the probability, we need to compute 6, 
From formula (10,4), we obtain 


Thus, we may expect with a probability exceeding 14/15 that if the die is thrown 1,200 
times sixes will fall between 150 and 250 times, 
We note that if we use the cruder formula (10,5) to determine 38, 


, 1 
~ 4Be2 ’ 


we get a much less satisfactory result; 8 = 3/25, 


48. LAPLACE'S LIMIT THEOREM 


THEOREM. Suppose that trials are made that are independent with 
vespect toa certain event and thatineach ofthem the probability 
of the eventis aconstant p. If the number of trials n increases 
without bound, the limit, as n - oo, of the probability that the num- 
ber of occurrences k ofthe event will lie between the numbers 
k, and k, (for k, <k,) is 


where 


or, more briefly, 


lim Pin Sk < hy) =e few dt. (10.6) 


% > CO 


Proof: Formula (8.8) 


The Law of Large Numbers 149 


where 


was derived in Section 38 for calculating the approximate value of 
the probability &. Here, At is the increase in ¢ corresponding to two 
values of ek differing by unity. From the theorem on addition of 
probabilities, 


k 
PU Sk < h)= 2 Pun (10.7) 


(for example, P(6 < k < 10)=Po.n +P, ntPant+Pe,n+Pro, un). We 
denote by z, and z, the values of z corresponding to the numbers fk, 
and Ro: 


_ A&—np _ ky —np 
,= 


—— z ——! 
V npq > ¥npq 


If we replace all the numbers P,, in (10.7) with their approximate 
values according to the transformed formula (8.8), we obtain the 
approximate equation 


P(ki <k<h)= Yee * At, (10.8) 


where the summation is carried out for all values of ¢ corresponding 
to the integral values of & from &, to k,. It follows from the ex- 
pression for At that At—-0 as n-—-+co. The function e-—, which 
takes discrete values for finite n, is continuous. Therefore, if we 
replace the sum in (10.8) with an integral, we obtain the exact 
equation: 


Zs t 
| -— 
im P =e fe Fat. 
jim (R, < Rk < Ro) Von : e 

Let us now break the integral in the basic formula just obtained 
into two integrals and then reverse the limits of integration in one 
of them, as follows: 


t? mt? a oP 
e Fatt fe tat=fe *at—fe * at. 
0 0 0 


Zy 0 


feraaf 


! ra 


Let us define 


Z 2 
1 -— 
D(z)= e 2 dt. (10.9) 
/ y 2 


150 Mathematical Analysis of Observations 


The basic formula can now be written in the form 


lim P(ki << k < ky) = © (z,) — (2). (10.10) 


n> co 


The values of the function ®(z) can, for small values of z be cal-~ 
culated by expanding the function in (10.9) ina Maclaurin series, 
integrating the series obtained termwise, and then finding the first 
few partial sums of the integrated series. 

It will be sufficient to calculate the function ®(z) for positive 
values of z since this is an odd function. That is, by replacing z 
with —z and ¢ with—+« in the integral (10.9), we obtain 


—2 i? z <2 
o(— =/[ “Fat — f ~_ ¢ ? dr, 
(— 2) . Vis e . Vos e dt 
or 
Values of the function ®(z) for 0.00 < z < 5,00 are givenin Table 
III at the end of the book, 

In practice, Laplace’s theorem is used for finding the approxi- 
mate value of this probability for large values of n. In this case, 
P(k, Rk < ky) = O(z,) — O (z,), 

where 


Laplace’s theorem can be put in yet another form. The in- 
equalities 8, <k <A, are equivalent to the inequalities 


where 


Then, 


fi nP , z= ky—np a 
V npq Py — Vapq yf Pq 

Vy y # 

The numbers a, and a, can be given instead of the numbers e, and 

k,. They represent the bounds between which the deviation of the 


zy 


The Law of Large Numbers 151 


relative frequency k / n from the probability p lies; they can be either 
negative or positive, 

The probability is not changed if we replace this inequality with 
an equivalent one. Therefore, we may write 


lim P(a <“—p<m)= rele r at = 0 (z,) — 0 (2), 


n> oO 


where 


Z, = —tL. = 
1 ==", Lo == ——- , 
y #4 Pg 
n n 


In particular, if «,<0 and «,=—«a,=a, then, because ®(z) is an 


odd function, 
Ve } 


If nis a large number, we may write the approximate equation 


p(|= pel <e)}=20 (ee 


n>cm 


lim P(|=—p| <a) =? 


) (10.12) 


The function ®(z) assumes values only slightly different from 0.5 
when z is greater than 3. Therefore, the limit of P is close to unity 


if «a exceeds 3 fr. (D(z) = 0.49865 if z= 3). Suppose, for ex- 
ample, that n=100 and p—q= 0.5. Then, 3}/ “ = 0.15 and, 


on the basis of Laplace’s theorem, we may assert that the abso- 
lute value of the deviation of ¢/» from p will almost certainly be 
less than 0.15 (that is, with probability close to unity). (This 
assertion is true only approximately.) Although Laplace’s theorem 
ig valid only in the limit, it is frequently used for calculating 
probabilities even when n is finite. Quite satisfactory results are 
obtained if npg is of the order of 100 or greater. 


Example 1, A coin is thrown one hundred times, Calculate the probability of its 
landing tails between forty and sixty dmes, Here, n= 100, p=q= 0.5, Ynpqg =S, 
k, =40, and k, =60, Therefore, z;== — 2 and z,= 2, We obtain as an approximation 

P(40 <k < 60) = ®(2) — © (— 2) = 26 (2). 
From the table, we find ®(2) = 0.4772, Therefore, 
P(40 < k < 60) =0.95. 
Example 2, A die is thrown 2,400 times, Determlne the probability that the absolute 


value of the difference between the relative frequency and the probability of the die 
falling on six will not exceed 1/24, 


152 Mathematical Analysis of Observations 


Here, p= 1/6,9 = 5/6, npg = 333, and 


1 
— 4 = Og SA OG? — 23 29 = 274: 
consequently, 


k 1 i 
ee oe — ~~ 2.7 == 0,994. 
P( | sixo =|) <za 20 (2.14) 


49. INEQUALITIES AND CHEBYSHEV'S THEOREM 


Let us consider n pairwise independent randomvariables X®, x,, 
...,X(” Here, the superscripts are indices; that is, they are used 
as a way of numbering these random variables. They do not indi- 
cate the values that the random variables may take. Each of them 
may take several values, the number of which can be different 
from variable to variable, 

Let us denote the expectations of these random variables by 
X14, Xo)... x, and their variances by o;, o3,..., %. Consider the 
sum of these random variables 


X=XOLXOL ... +X; 


We denote the expectation and the variance of this sum by 


x= E (X), 
a — EF (x — xy}. 


From the theorems on the addition of expectations and variances, 
we have 


x=x, tx +... 4+x,, 
ot@asot tol t 1... +02. 
The random variable (X — x)? can assume only nonnegative values, 


and its expectation is known. Therefore, we may apply the 
Chebyshev-Markov lemma to it: 


P(X —x¥ <o%] >1—Z ° 


Let us replace the inequality in the square brackets with the 
equivalent inequality 


|X —x|<ot 


*We recall that /2 is an arbitrary positive number, 


The Law of Large Numbers 153 


(The probability of obtaining the inequality is not changed by this.) 
Then, we obtain 


P(|X—x|<te} >1—F. (10.13) 


If xX is understood to refer to one random variable only, we shall 
call this inequality Chebyshev’s first inequality, It enables us to 
set a bound for the probability of a given deviation from x for an 
arbitrary distribution law. 

If we replace xX, £, o%in inequality (10.13) with the expressions 
that we listed above for these quantities, we obtain 


P{|XO4 XO4 .., +xXM (ce, +%,+... $x )/< 
<tVet... oe} >1—S. 
We shall call this inequality Chebyshev’s second inequality, If we 


divide both sides of the inequality appearing inside the braces, by 
nm, we Obtain Chebyshev’s second inequality in a different form: 


(10,14) 


p {|Ao4X+ _ 4x Xt Ht... 424 c 


n n 


(10.15) 


tV ae wee +o? 

< > 1a =. 

Here, (X44 x@+4 ... + x™) / n isthe arithmetic mean ofthe random 
values of the given variables and (x,+%,4 ... +%n)/n is the 
arithmetic mean of their expectations. Chebyshev’s inequality, 
when written in the second form, gives an estimate of the proba- 
bility that the absolute value of the deviationof the first mean from 
the second mean will be less than a quantity depending on the sum 
of the variances of the given random variables. 

Let us now prove Chebyshev’s theorem. 

THEOREM. Suppose that the random variables X, X®,..., 
X\") are pairwise independent and that they have given expectations 
X,,%>y «+5 X, and given uniformly bounded variances, Then, we 
may expect witha proability arbitrarily close to unity that the 
absolute value of the difference between the arithmetic mean of 
the given variables and the arithmetic mean of their expectations 
will be arbitrarily small in absolute value if nis sufficiently large. 

Let us denote by «+ an upper bound on the absolute value of the 
difference between the arithmetic means referred to inthe theorem. 
Then, Chebyshev’s theorem can be written as follows: 

For any two positive numbers « and 6, the inequality 

p{|X04 x04 a +X _ ait met + Xn <e}> 13 
will be valid for every value of n> N(z, 0), where N is a fixed posi~ 
tive number whose value can be determined from the values of « 
and 6. 


154 Mathematical Analysis of Observations 


Proof: Let us write Chebyshev’s inequality in the second form: 


p{/Xee xt ee $XM ey tet oe Hen] ES 


n n 


(Verret ... +22 ] 
gt ats ! T° 


n 


To convert this inequality into the inequality that we wish to prove, 
let us choose ¢ such that 


ee 
2 2 

(Veitoetin ten, ne 

ey ~~ oo, %8.,\, .2 ° 
n Veitet ... +33 


We then have 


pierre - 4X0 x, + 2+ 1. kp 


n n 


By hypothesis, the variances are uniformly bounded by some 
number 8; that is, 


e®<B, &2<B,...,%&<B, 


so that 


ota ee. $95 <anB. 


On the right side of the inequality for P, let us replace s?+-o?+ 
... +23, with 7B, Obviously, this does not violate the inequality. 


Consequently, 


<eb>1— 


n ne3° 


p{|XO+ x vee $ XM ey feet ww. ben 
n 


We now choose the number N such that 8/ (N-y =8; that is, we 
choose N= B/ (c%). Then, 8/ (ne?) will be less than cif 2>N, 
If we replace 8/ (nc} with 3 on the right side of the inequality, we 
Shall not violate the inequality. Thus, 


n nm 


(1) (2) (12) XY. X. Y 
pi\= +X +....4%X — ase Es) eh 1 —a, (10.16) 
provided 
no>Na—, (10.17) 


which completes the proof of the theorem. 


The Law of Large Numbers 155 


COROLLARY. Suppose that a random variable X has a given 
expectation aand a definite variance «?. Suppose that a number of 
independent observations are made to determine the values of X. 
Then, we may expect, witha probability differing from unity by an 
arbitrarily small amount that whena sufficiently large number of 
observations are made, the arithmetic mean of the observed values 
wiil differ in absolute value by an arbitrarily small amount from 
the expectation of X, 

Proof: Note that these random values of X, obtained from in- 
dependent observations, can be considered independent random 
variables &, &,... , 2, Since each of them can have only those 
values that x has (and with the same probabilities), the expectation 
and the variance of each of them is the same as for x. Using the 
notations that we used for Chebyshev’s theorem, we may write 


Therefore, the important condition for boundedness ofthe variances 
in Chebyshev’s theorem is satisfied. Here, we take <? for the 
bound 8 Applying Chebyshev’s theorem to the random variables 
cf) 22. , &, we obtain 


» #(t) 2(2) c(n) -% 
p}) Es FS al cel s1—a, n>N,=— (10.18) 


e246 ” 


which proves the corollary: AN, is a sufficiently large number of 
observations. 

The corollary to Chebyshev’s theorem is of great significance 
in applications. It states that when a sufficiently large number of 
observations are made, we may expect, with a probability close to 
unity—in short, we may be virtually certain—that the average ob~ 
served value will differ from the expectation, i.e., from the 
theoretical arithmetic mean, by an arbitrarily small amount. 


Let us consider an example of the application of this corollary to Chebyshev’s 
theorem, * 

Let us suppose that a quantity a (for example, a length) has a definite though unknown 
value, Suppose that this quantity is measured 7 times, Because of the random errors 
that occur in the measurements, we obtain different numbers a;, Wye, Gn The 
question then is how many measurements would be enough to ensure with a probability 
greater than 0.99 that the average value of these measurements differs from the exact 
value by an amount less than 0,17 

Since a is assumed to have a fixed value, its expectation is equal to 7, To solve the 
problem, we must know an upper bound on the variance of the random measurements, 
In specific problems of this type, an upper bound cannot be determined, since we do not 


*It should be noted that examples of this kind serve primarily to clarify the content 
of the theorem, The estimates for a sufficiently large number of observations given in 
this theorem are exaggerated and, furthermore, the conditions of independence of ob- 
servations cannot always be assumed to be fulfilled, 


156 Mathematical Analysis of Observations 


know the probabilities of the different values, However, if analogous measurements 
have been made before, we can exhibit a sort of upper bound on the variance on the 
basis of the results of the earlier measurements, (This question will be taken up in 
Chapter 4.) We simply assume that the upper bound on the variance in this problem is 
equal to 0,02, 

Since in the present case, « = 0,1, 5 = 1 — 0,99, and s* = 0,02, we have 


Thus, if the number of observations exceeds 200, we may expect with a probability ex- 
ceeding 0.99 that the arithmetic mean of the measured values (2), @.,.6..6, 4%» differs 
from @ in absolute value by an amount not exceeding 0,1. 


50. COMMENTS ON THE LAW OF LARGE NUMBERS. 
STATISTICAL PROBABILITIES 


Putting Bernoulli’s theorem in a few words, we may say that the 
relative frequency in the case of a large number of observations 
would differ only slightly from the probability. The significant 
feature of this assertion lies in the fact that it is only probable, 
although the probability can be made arbitrarily close to unity. 

Therefore, if we were to perform the experiments described in 
the examples considered above, we should not expect that these 
experiments would definitely give agreement with the results of the 
calculations. To illustrate this, let us consider example 2 of 
Section 47. The results of the calculations showed that we may 
expect, with a probability exceeding 0.875 that the number of times 
the coin would fall heads would be between 80 and120. Ina 
particular experiment (in which the coin is tossed two hundred 
times), the number of times the coin falls heads may lie outside 
these limits, all the more so since the obtained lower bound on 
the probability is not very close to unity. To clarify the practical 
meaning of this result, we should imagine that a large number of 
such experiments (each consisting of tossing the coin 200 times) 
are performed. If we apply Bernoulli’s theorem repeatedly, we may 
expect with a probability close to unity that in the majority of 
these experiments the number of times the coin will fall heads will 
lie between the limits indicated. (Speaking crudely, our prediction 
on the limits will be satisfied in seven trials out of eight on the 
average.) 

If the experiments performed deviate considerably from the 
calculated results, we may conclude that, in the case of the 
phenomenon being examined, the conditions of Bernoulli’s theorem 
are not fulfilled. The most stringent of these conditions is the re- 
quirement that the trials be independent. 

In Bernoulli’s theorem, the probability ofthe event was assumed 
known and the question was one of predicting (with some probability) 
limits between which the number of occurrences of the event would 
lie. Even in this form, the theorem points to the possibility of 
finding out the unknown probability of a random event from a large 
number of observations on the occurrence and nonoccurrence of 
the events in question. 


The Law of Large Numbers 157 


Such a possibility follows from the converse of Bernoulli’s 
theorem: if we may assume thatinaninfinite number of independent 
experiments an event has a constant probability (the value of which 
we do not know), then, with a probability arbitrarily close to unity, 
we may expect that the ratio of the number of actual occurrences 
of the events to the number of observations will differ by an 
arbitrarily small amount from the probability if the number of ob- 
servations is sufficiently great. Therefore, 


Piip—z|<eb>1—3 (10.19) 


ifn > N(e, ),where N can be determined from the values of « and 
6. For large values of n, we obtain 


pre, (10, 20) 


Probabilities calculated on the basis of the results of observations 
are said to be statistical or empirical. 

Chebyshev’s theorem can be worded as follows: if the number 
of random variables is sufficiently large, the arithmetic mean of 
the random values of these variables will differ only slightly from 
the arithmetic mean of their expectations. The corollary to 
Chebyshev’s theorem can be worded analogously. These statements 
of the theorem and its corollary show that when a sufficiently large 
number of independent observations are made, it is possible, with 
a probability close to unity, to obtainfrom observations an approxi- 
mate value of the arithmetic mean of the expectations or the ex- 
pectation of a single quantity. 

It should be noted that the estimates on the necessary number 
of observations (obtained by application of the Bernoulli and 
Chebyshev theorems and resulting from the methods by which these 
theorems are proved) are much too large. Certain methods of 
making these approximations more exact are to be found in the 
works of S. N. Bernstein, A. Ya. Khinchin, and other authors, 

In practice, the application of these theorems of the law of 
large numbers is restricted by the condition that the trials or the 
random variables be independent. It has been shown in a number 
of works on probability theory that this restriction can be removed 
if the relationship between the random variables that we are 
studying is very weak. Also, certain conditions under which the law 
of large numbers is applicable to dependent random variables have 
been established, but this question has not been studied sufficiently. 

Laplace’s theorem is ordinarily called the Laplace limit 
theorem, since it establishes a limit to which the probability that 
the number of occurrences lies between the given bounds converges. 
In order to avoid the excessively high estimates for a sufficiently 
large number of observations thatare given by Bernoulli’s theorem, 
we often use Laplace’s theorem. However, we should note that this 
gives less definite results than does the application of Bernoulli’s 


158 Mathematical Analysis of Observations 


theorem, since we do not then have an estimate of the error made 
in replacing a formula that is valid in the limit as n—oco with the 
same formula for a finite value of x, 

As a check on the law of large numbers, many different experi- 
ments (such as, for example, the tossing of coins) have been per- 
formed, The results of these experiments showthat the predictions 
that one would make on the basis of the theorem turn out to be true 
in almost all experiments. (Of course, the experiments were such 
that the conditions of the theorems were satisfied.) 

The question of the applicability of the law of large numbers to 
natural phenomena is rather complicated. Assertions similiar to 
those that we made in certain examples must be made with suitable 
restrictions and must be checked by systematic observations, 


Chapter 1] 
CONTINUOUS RANDOM VARIABLES 


S1. THE DISTRIBUTION FUNCTION OF A CONTINUOUS 
RANDOM VARIABLE 


In this chapter, we shall consider random variables that may assume 
arbitrary values in some region. Let us denote by X a continuous 
random variable that can take any real value in an interval (a, 5), 
In such a case, it is impossible to state the probability of each in- 
dividual value and we should not pose the problem of the probabilities 
of specific values since there are infinitely many of them. In the 
case of continuous random variables it is meaningful only to speak of 
the probability of the value falling ina certain interval. This interval 
may contain the range of the variable or may intersect that range. 
Usually, the problem is one of calculating the probability that a 
random variable X, defined inaninterval(a, 5), will assume a value 
in a subinterval (2,6) of (2,6). We shall consider this the basic 
problem. In particular, the region (c,b) may extend from 0 to co or 
from —oco to +o. 

In the case of a discrete variable, to solve the problem of the 
probability that a random variable will take a value in the interval 
specified, we needed to give a table of values of the variable and 
the probabilities of these values. If the variable is continuous, 
instead of a table (distribution of probabilities) we must be given 
certain functions (the distribution functions). 

In what follows, let us represent the basic problem graphically. 
Suppose that a segment AS on the real axis represents the set of 
possible values of the random variable. Let us put the event—the 
random variable X will take the value x—in correspondence with 
the choosing of the point x in a random selection of a point on the 
real line. If (a,@) represents the given interval, the graphical 
interpretation of the basic problem isthe problem of calculating the 
probability that this point will fall on the segment a8 (see Fig. 5). 
Such a probability is a function of the coordinates of the end points 
a and 8, but giving such a function as a function of two arguments 
would unnecessarily complicate the theory. Instead, the distribution 
function is given. This distribution function is defined as follows: 


159 


160 Mathematical Analysis of Observations 


a one~argument function F (x) whose value is equalto the proba- 
bility that the random variable will take a value less than the argu- 
ment x of this function is known as a distribution function of a con- 
tinuous random variable.* By definition, we then have the equation 


P(X <x) =F (x), (11.1) 


in which the left side should be read ‘‘the probability that the ran- 
dom variable X will assume a value less than «x.’’ It follows from 
the definition of a distribution function F(x) that 

F(ix)=0, if x<@, 

Fix)=1, if «Db. (11.2) 
If x increases monotonically from a to 4, the function F(x) will also 
increase monotonically from 0 to 1 since the increase in x widens 
the region of possible values ofthe variable and hence increases the 
probability that the variable will assume a value in that region. 


1 
A a B 


Fig, 5, A representation of the basic 
problem for continuous random vari- 
ables, 


On the basis of these properties, we may assert that a graph of 
a distribution function will have the shape shown in Figure 6. To 
the left of the domain of definition, the graph is a straight line 
coinciding with the x-axis. Between a and 2, it is a monotonically 
ascending curve with ordinates varying from 0to 1. To the right 
of the point 8, it is a straight line parallel to the abscissa but at 
unit distance above it.** 


9) A C B x 


Fig, 6. The graph of a distribution function of a continuous random 
variable, Od =a, OC= x, OB=b, CD=P(X < x). 


*Note that the concept of a distribution function can be extended without change to the 
case of discrete variables, 

**It is not difficult to show that the graph of a distribution function in the case of a 
discrete random variable is that of a stepfunction (like the profile of a stairway), 


Continuous Random Variables 161 


To show the expediency of having a distribution function given, let 
us show that if we know it we can easily solve the basic problem 
stated above. 

Suppose that we wish to determine the probability of a random 
variable assuming a value in an interval (a, 6b). Then, 


P(6)=P(X<B) F(@)=P(X <2). 


Since the interval from a to 3 consists of two non-intersecting 
parts (namely, the part from a to a and the part from «to 8), the 
random event (i.e. falling to the left of the point 8) has two 
mutually exclusive parts, namely, falling to the left of a and falling 
between 2 and §. From the theoremon the addition of probabilities, 
we may write 


P(X Cay) tP(aca <p) =P (X <§). 


Therefore, 
P(2<X<f)=P(X <8)— P(X <2) 


or (11.3) 
P(ia<X<8)=F(3)—F(a). 


The last equation shows that if we know the distribution function of 
a continuous random variable, it is quite simple to solve the basic 
problem; the probability that a random variable willassume a value 
in a certain interval is equal to the difference between the values 
of the distribution function at the upper and lower end points of the 
interval. 

One of the problems in the study of random variables consists 
in choosing certain numerical characteristics of a random variable 
and in finding a methodof calculating them. One of these character- 
istics—the median—is closely connected with the distribution 
function. The median of a random variable is that value determined 
by the condition that the values of the variable greater and less than 
the median be equally likely; that is, 


P(X<m)=P(X>m=s. (11.4) 


If the distribution function F(x) is given, the median is determined 
by the equation F(«)= 0.5, which, because of the monotonicity of 
F(x), will have a unique solution. On the graph of F(x), the median 
is represented in an obvious manner: we draw a straight line 
parallel to the x-axis at a distance 0.5 above it; the abscissa of the 
point of intersection of this line with the curve represents the 
median. 


52. PROBABILITY DENSITY 


Suppose that we know the distribution function F(x) of a random 
variable X defined in the interval (ec, 5). When we calculate the 


M 


162 Mathematical Analysis of Observations 


probability of the random variable taking a value in the part of 
this interval from « to «-+-Ax (where Ax > 6) we obtain 


P(x << X < x+- Ax) =F («4+ Ax) — F(x). 


If we divide both sides of this equation by 4x, we have 


P(x< X<x+ 4x) _ F(x + Axy— F(x) (11.5) 


AX ax 


The quotient on the left is the ratio of the probability of the vari- 
able falling on the segment of length Ax to the length Av. This 
quantity can be called the average probability density on the seg- 
ment Ax with origin at the point «. The quotient on the right is the 
ratio of the corresponding increase in the distribution function to 
the increase of the argument. In this equation, let us pass to the 
limit as Ax +0, assuming that F(x«)is a differentiable function. The 
limit of the left side is the probability density at the point x in 
analogy with the concept of linear density at a point. The limit of 
the right side is equal to the derivative of the distribution function. 
If we denote the probability density by p(x), we have a simple 
relationship between the probability density and the distribution 
function: the probability density is equal to the derivative of the 
distribution function with respect to the argument of the distri- 
bution:* 


aF (. 
p(x) (11.6) 


fet us note certain properties of the probability density: 

I. The probability density isa nonnegative quantity at all 
values of the argument, 

This follows from the fact that p(x«)is the derivative of a non- 
decreasing function F (x). 

Il. Ifa random variable is defined ina finite interval from a 
to b, the probability density p(x) is Owhen x <aorx> bd, 

This is true because values of x outside the region (a, 0) are 
impossible and hence, the possibility of the variable assuming a 
value in any interval outside that region is equal to 0. Therefore, 
the probability density is also equal to 0. 

Because of this property, we may always formally assume that 
a random variable is given for the entire real line from —co 
to +-co, but with p(x) having different types of values for three 
subintervals of the real line; that is, p(x) is equal to 0 to the left 
of a, positive from a to 6, and equal to 0 to the right of 2. 

Ill. The value of p (x) within the interval or at its end points can 
be any nonnegative number of + oo, 


*Because of this equality, the probability density is sometimes called the ‘‘dif- 
ferential distribution function’’ and the distribution function that we defined above is 
called the ‘‘integral distribution function,”’ 


Continuous Random Variables 163 


This follows from the fact that the derivative of the monotonic 
increasing function F(x) defined in this interval is inno way 
bounded. In connection with this, we should note that we must not 
confuse the probability density with the probability (whose values 
never exceed unity). 

IV. From the definition of probability density, we have the ap- 
proximate equation 


P(x < x < x ao Ax) ~ p (x) AX, (11.7) 


where Ax is a sufficiently small positive number. 

In computation, this equation is often written asif it were exact. 
This introduces no error if the logic of the problem indicates a 
passage to the limit as Ax— 0. 

V. The solution of the basic problem is given by the formula. 


B 
P(acX <= f p(x) ax, (11.8) 


This formula is obvious ifwe remember the solution of the basic 
problem by means of the distribution function F(x) and consider the 
relationship between p(x) and F(x). (Specifically, F(x«)isa primitive 
of p(x).) 

VI. [fa random variable is given ina finite region from a to b, 
then 


b 
foe@ac= 1, (11.9) 


that is, the area between the x-axis and the graph of the probability 
density is equal to l. 

This is true because the left side is, by formula (11.8), the 
probability that the variable will fall in the interval from a to 06. 
and, by hypothesis, the variable can take values only in this in- 
terval. This means that the probability of its falling in the interval 
(a, 6) is equal to-unity. The equation written above is sometimes 
called the condition of normalization of the probability density. This 
means that condition (11.9) must be imposed on the probability 
density if this function is to some degree arbitrarily chosen. 

The graph of the probability density is usually called the dis- 
tribution curve. lt can have one or several maxima in the interval 
in question. The value of the random variable corresponding to the 
maximum ordinate of this curve is called the mode. 


53. EXPECTATION, VARIANCE, AND MOMENTS 


In Chapter 9, the expectation of a random variable was defined only 
for those random variables that could assume only a finite number 
of discrete values (though the definition could easily be extended to 
variables having an infinite number of discrete values). 

The concept of distribution functions of continuous random 
variables makes it possible to introduce the concept of the ex- 
pectation of a continuous random variable. 


164 Mathematical Analysis of Observations 


Definition 1. If X is a continuous random variable that assumes 
values x in the interval from a to b (where a <b) and if p(x)is the 
probability density of this variable, its expectation is defined to be 


b 
E(X)=x= f xp(x)ax. (11.10) 


Formula (11.10) may be considered as defining the theoretical 
average value of a continuous random variable. 

Definition 2. The expectation of a random variable is called 
the center of its distribution. 

In the probability density graph, the point on the « ~axis whose 
abscissa is the expectation of the random variable is also called 
the center of distribution. (This name is explained by the fact 
that the center of gravity of a nonhomogeneous line segment the 
abscissas of whose ends are equal to a and 6 is determined by the 
same formula as the expectation if the density at the point is 
equal top(*).) 
~ Definition 3. The expectation of an arbitrary single~valued 
continuous function 9(X) of a continuous random variable ¥ whose 
probability density is p(x)is defined by the formula 


b 
E {9(X)} = f 9 (x) p(x) ax. (11.11) 


This definition is justified by the fact that the probability of 
o(X) assuming values between 9(x,) and 9(x,) is equal to the proba- 
bility that X will assume a value between x, and x, no matter what 
x, and x, are. 

Formula (11.11) defines the theoretical average value of the 
function 9 (4). 

All the properties of the expectation that were given for discrete 
variables remain valid for continuous ones. 

Let us turn now to the concept of the moments of a random 
variable. 

Definition 4. The initial moment of order s of a random vari- 
able X defined in an interval (a. 5) with probability density p(x) is 
the number 


3) 
ve = f x*p (x) dx, (11.12) 
a 


that is, the expectation of the sth power of the random variable 
(where s is a positive number). 
From the normalization condition and the definition of expecta- 
tion, 
Vo = 1, 


vy, =E(X) =x. 


Continuous Random Variables 165 


Definition 5. The central moment of order s of a random vari- 
able V with probability density p(x)is the number 


b 
ih. = J (x — x)" p(x) dx, (11.13) 


that is, the expectation of the sth power of the deviation of the 
random variable from its expectation. 
It follows from the definition that 


Ly == 1, (11.14) 
b, = 0, (11.15) 
Since 
b 
bo= f p(x)dx ==, 
b ° b 
r= f xp(x)dx—x f p(x)ax=x—x. 1 = 0. 
Furthermore, 
b 
be = | (x —x)'p(x)dx =o, (11.16) 


that is, the central moment of second order is the expectation of 
the square of the deviation of the random variable from its ex- 
pectation. From the definition given in the chapter on discrete 
random variables, such an expectation is called the variance. If 
we extend this definition to continuous random variables, we may 
say that the central moment of second order is the variance of a 
continuous random variable. The properties given in Section 44 for 
the variance of a random variable remain valid for continuous 
ones, because in the proof of these properties we did not use the 
discreteness of the variables. Let us derive formulas expressing 
the central moments in terms of the initial moments. First, we 
expand the factor (x— x)" according to the binomial formula. We 


obtain 
b 


b 
nef xp (xax—sx f p(x) dx + 
a 


a 
b 


b 
4 SD 32 f x8-2p(x)dx +... +(— 1)'CEx" f "9 dx +... 


a a 
b 


, 4 SEN e-2(— 1°? f x2p (x) ax + 


b b 


4 (— tse! f xp (x) dx +(— x f p(wax. 


166 Mathematical Analysis of Observations 


All the integrals are equal to the initial moments: 


g? gs-tre °- ee? Vos Vas Vo: 


As was shown above, %— 1 and v,=—«x. Therefore, the last two 
terms can be simplified and, consequently, 


s-—3 
Lg = 2(— 1y¥ CF), H(— 19° (5 — 1) x". (11.17) 
In particular, 
bo = Vo —_ x2, 
Ug == Vy — 3y,x + 2x-3, (11.18) 


by = vy — 4vgx + 6, x2 — 3x4. 


It would have been possible to treat the moments as (free) numerical characteristics 
of the random variable, but the dimensions of the moment of order sare equal to the 
dimensions of X taken to the power S, which is inconvenient, It is more natural to treat 


8 
the numbers Yu,, in particular c=». as the numerical characteristics, Use of the 
central moment excludes the influence of the origin from which the variable X is meas- 


ured, but it does not exclude the influence of the scale used in the measurement of this 
quantity. For that reason we should take the numbers 


8 


mgo= 8 for s>2 
a 


for the moment numerical characteristics, The numbers m, do not depend on the way in 
which the quantity « is measured; that is, they do not depend either on the choice of 
origin or on the choice of units, These are pure numbers and they can be used for 
comparing different random variables with each other, 

The moment characteristics supplement the basic characteristics, which depend 
both on_the point chosen as origin and on the scale: that is, they supplement the average 
value x, the median m, and the mean square deviation o,, For odd valués of s, the 
smallness of the number m, can serve as an indication that the distribution (the proba- 
bility density) is nearly symmetrical about the center of distribution, 


54. UNIFORM PROBABILITY DISTRIBUTION 


To illustrate the general situation, let us take the simplest case, 
the one in which the probability density is constant in the interval 
from a to 6. In this case, the graph of the probability density is 
a line segment parallel to the x-axis (Fig. 7a). In order that the 
area bounded above by the distribution curve be equal to unity 
(the normalization condition), we must require that the constant 
value of the probability density be equal to 1/, — a. 

The distribution function F(x) can be found by integrating the 
probability density with the initial condition F(c)= 0: 


6) 


Fiy= | 5%. 


a 


and hence, 


Continuous Random Variables 167 


' 
' 
1 
' 
' 
b 


0 a xX 


b 


Fig, 7, The case of uniform distribution, (a) 
The graph of the probability density; (b) the 
graph of the distribution function, 


F(x) = —<. 


The graph of the distribution function isaline segment intersecting 
the x-axis at the point X—a and having ordinate equal to unity at 
X=b (Fig. 7b). The probability that X will have a value between « 
and ¢ is obtained from the formula 


PU<X<P= fas 


The expectation is 


or 


This result is rather obvious. The distribution is symmetric 
and therefore the center of distribution must be at the geometric 
center of the interval. The median in the case of uniform distri- 
bution coincides with the expectation. 


168 Mathematical Anaiysis of Observations 


If we displace the origin of a uniformly distributed random 
variable to the center of distribution, we obtain the new variable 


Obviously, 


The probability density Y can be written in the form 
1 


where 2c=6— a is the length of the interval. 

The variance of the uniformly distributed variable Y is equal to 
the variance of the variable X since these two variables differ only 
by the displacement of the origin, which has no effect on the vari- 
ance. This common variance is 


In the case of uniform distribution, the quantity c is the absolute 
value of the limiting deviation from the center. It is related to o, 
by the equation c=5,YV3. 


55. FORMULATION OF LYAPUNOV'’S THEOREM. THE NORMAL. 
PROBABILITY DISTRIBUTION 


In the study of natural phenomena, we need to deal with random 
variables that can be considered as sums of a large number of in- 
dependent random variables. Lyapunov’s limiting theorem charac- 
terizes (with certain restrictions) the distribution law for such 
sums. This theorem is formulated as follows: 

Suppose that 


Z=XMOLXOML .., AX, 
where V(, X@, ...,X are mutually independent randomvariables 
having given expectations, variances, and ‘‘absolute’’ central 


moments of order 2-4 a: 


E(X)) = x,, Var(X)—=o,, E(| xX —x,**) = »,, 


Continuous Random Variables 169 


where a is some positive number. Suppose that 


n 
\" 
Pali 
lim ——— = QO, 
nm-»co OT 


Then, the probability that 
{39<7—2< 1,0, (11.19) 


7 


n 
where z= >’ x, and o?= ))3%, will, as - ow, approach the limiting 
1 k=1 


ts 


— e 4 dt. (11.20) 
Lo. 
V2 


The assertion of this theorem can be formulated approximately 
in a somewhat different manner: for a sufficiently large number n 
of random variables added together, the probability of the in- 
equality a< Z <3 will be arbitrarily close to the quantity 


(t—2) 


1 f 11.20 
vel? at. (11.20a) 


It is clear from this formulation of Lyapunov’s limit theorem 
that when the number of random variables is increased without 
bound, the probability density of the sum approaches the function 


| (11.21) 


provided that the random variables being summed satisfy the con- 
ditions of the theorem. The distribution law with probability density 
given by formula (11.21) is called a normal distribution, and we say 
that the random variable having such a distribution is normal. The 
numbers a and ¢ should be thought of as parameters. 

An analogous distribution law is given by Laplace’s approximate 
formula for the problem of repeated trials. Inthe theory and appli- 
cation of random variables, the normal distribution is often used 
by extending the law derived for a random variable of the defined 
type (number of occurrences) to arbitrary random variables. 
Lyapunov’s theorem states the conditions of applicability of a 
normal distribution. Itis suitable in those cases in which the random 
variable being studied can be treated as the sum of a large number 
of random variables. The essential point is that the random vari- 
ables being summed may obey any law of distribution. 

S. N. Bernstein has given a supplementary condition for the 
numbers jy, to exist and for the ratio of the sum of these numbers 
to the given power of 3 to approach 0, namely that none of the 


170 Mathematical Analysis of Observations 


individual random variables should differ too much from the others 
in magnitude or variance. Attempts have been made to formulate 
a distribution law for the sum of uniformly distributed random 
variables. These attempts have shown that even when the number 
of variables is as small as twenty, a good approximation to the 
exact normal law is obtained. 


56. AN APPROXIMATE DERIVATION 
OF THE NORMAL LAW 


The proof of Lyapunov’s theorem is too long to include in the present volume, Therefore, 
we shall consider a simplified derivation of the normal law proposed by Pearson, We 
introduce the following conditions: (1) The values of therandom variable are deviations 
from some constant value a; (2) Each of the individual deviations is the result of a random 
action of a large number of causes, each of which effects a small random deviation; (3) 
These causes act independently of each other. 

These conditions were first introduced in the theory of measurement errors, where 
the measured quantity has a definite numerical value and the different numbers that we 
obtain on taking measurements are the result of various random causes (a chance gust 
of wind, the jolt resulting from the slamming of a door far from the laboratory, etc,), The 
difference between the observed and the (unknown) actual value is a random error of 
observation, 

When we speak of the distribution of a quantity such as the inclination of a set of 
asteroids, we may speak of the deviation from some constant value, but these deviations 
should not be called errors, In this case, the second of the conditions listed is less 
justified since the deviation of an individual value from some constant level is the result 
of the action not only of random causes but also of systematic ones (for example, dis- 
turbances from large planets in the case of the asteroids), Therefore, the second con- 
dition should, in this case, be considered only as a simplification in setting up a theoreti- 
cal scheme, 

We can give a clearer formulation of these conditions as follows: 

(1) The deviation U of the variable X from the value ais caused by the action of n 
causes, each of which brings about a deviation+ e« or —« thatis small in absolute value, 

(2) Elementary deviations that have equal absolute values are equally likely: 


1 
PQ) = P(—8) = 53 


(3) The causes of the elementary deviations are mutually independent; that is, the 
probability that one of these causes will effect a deviation of+ = (or —«) does not depend 
on the deviations resulting from the remaining causes, 

We note that assumptions (2) and (3) impose quite stringent restrictions on the set- 
up. Furthermore, they may be considered as sufficiently close to the actual conditions 
in the theory of errors, On the other hand, the restriction that the absolute value of the 
deviation be the same for all the variables is not essential, but is introduced only to 
simplify subsequent calculations, 

Starting with these three assumptions, let us now find the probability density of the 
variable U, Suppose that & causes have brought about positive deviations , and that the 
remaining n—k causes have brought about negative deviations, Then, the value of the 
net deviation U, which we shall denote by ux, will be 


up = Re + (n—k)(—e&) =(2R— rye. 
The probability of such a value is calculated from formula (8, 1) for the problem of re- 
peated trials, since the effect of each of the causes can be compared to a trial in which 


the event (deviation-++ «) may or may not occur, By the second assumption, p=q= 1/2, 
Therefore, 


n| 1 \” 
Phin = Btn py (5) 


Continuous Random Variables 171 


Since our probiem consists in establishing the reiationship between the variabie (/ and the 
probability density, we shall assume now that not & but & +- 1 causes yleided values of 
+e, Then, 


I 1 \% 
Une, = (2k +2—n)e, Proun = GPT @oPLTIT) . 


Let us denote by ju the iength of the interval (Uy, 4x41) and let us denote its center 
by uw. From the preceding formulas, 


Au = Opay—_—ah= 2e, 
“= Seon tte ak = (2k—n+ lye. 
Let us assume that the probability that the random variabie U wiii take values in the 
interval (4%, 4441) is approximately equal to the average of Px, , and Pz+3,n. Then, if 


we denote by y the probability density of the variabie U reiative to the center u of the 
interval in question, we obtain 


_P 
Y= Ta’ 
where 
pant Pret n | Au = U4, — Uy. 
Therefore, 
AP 
“Y= a’ 
where 


AP = Phit,n—Phem 


and, consequendy, 


A 


‘v 


AY 
y 


7 


if we calculate AP and P, we obtain 


_ {1\?(a— 2k — I)t al 
ap =(5) (A+ 1)!(n—1)!’ 
{1 (at lal 
P=(z) (R-+1)'(n—ky 


Therefore, 


Ay 2(n—2k—1)_ (n—2k—1)e-2 
yo UAT (nF 1 


To obtain the probability density of y, we must find the relationship between y and u, 
Since (n — 2k —1)e¢ =—-u,and2e=dAu, 


Ay _ u Au 
yo (n+ lye? 


For finite n, this equation gives an approximate relationship between the variable U, its 
probability density y, and their increments, Let us now let n approachoo, Then, it is 
natural to assume that « approaches 0, since otherwise infinitely large deviations would 
be possible without their probabilities being infinitesimally small, 


172 Mathematical Analysis of Observations 


Let us assume also that (1 + |)? approaches afinite positive limit 3, When we take 
the limit, we obtain the differential equation of the probability density curve 


dy  —wudu 
ye 
Integration of this equation yields 
wt 
y= Ce on 


where C is an arbitrary constant and a is a parameter, The probability density must 
satisfy the normalization condition, It follows from the equation that we have just ob- 
tained that u is unbounded, Therefore, we have 


ow 
fice 22" tu = | 
~COo 


This equation makes possible the expression of C in terms of o, If we make the substi- 
tution u/s =f, we obtain 


co 
Cs f e *dt=1 
-—co 
It can be shown that 
co t? 
-5 . 
f @ dt =Y 2rn, 
-c 
so that 
} 
C = 
oV Qn 
Finally, the probability density of a normal distribution is written as 
I we 
3a? 
= —e 11,22 
4 q V 2x ( ) 


In this equation, uw is the deviation of the value of the random variable from some constant 
number a, If, instead of the deviation, we introduce the value of the variable + (note that 
(x =u-+a), )in this equation, we obtain the probability density of a normal distribution 
in the general form: 


_(2-a/? 


} 3o8 
__, (11,23) 
4 a WV 2n 


57. PARAMETERS OF THE NORMAL LAW. GAUSS' CURVE 


To clarify the meaning of the parameters aand o in the normal law, 
let us find the expectation and the variance of a random variable 
obeying that law. 


Continuous Random Variables 173 


From formula (11.10), we obtain 


+o 


x _ (w-a)y 
E(X)= f Ware a0? dx. 


Let us make the change of variable 


ae"f =, dx=sdt. 


We then obtain 


+c oa) 
v? t? 
E(X)= f re Bt +a rae fat 


—-%® 


In the first term, the integrand is anodd function of the argument f. 
The integral of such a function over the interval (— oo, +o) is 
equal to 0. The integral in the secondterm expresses the normali- 
zation condition for a normal law of the particular form (c= 0, 
3-= 1). Therefore, this integral is equal to unity. Consequently, 


E(X)= x= a. 
Thus, the parameter a is the average value of the random variable 


Let us now find the variance of a normally distributed variable. 
From formula (11.16), we have 


If we again replace x with «—a/s =1!, we obtain 


ov ‘9 
| nerd 
varx — <3? [ue 2 dt, 
|b Vor 
—cc 


If we use the familiar formula for integration by parts 


b D 


fudv=uavl)— fvdu 


setting 


174 Mathematical Analysis of Observations 


so that 


we obtain 


&O ? 
l —_—- 
+ o@ { — @ 3 dt, 
TT Sioa V Qn 
ha 


The first term is equal to 0 because the exponential function e ? 
decreases with increase in ¢ more rapidly than any power of ¢ in- 
creases; in particular, it decreases more rapidly than ¢ itself 
increases. Therefore, 


varX = 3%, (11.24) 


that is, the parameter c« in a normal law of distribution is the 


mean square deviation. If we set a=-x in the formula for the 
normal law, we can write it in the form 


_ (v-ay 


1 (11.25) 


oV Qn 


y= 


If the average value x is equal to 0 and the mean square devia- 
tion « is unity, we have the so-called standard normal law of dis- 
tribution: 


O = 75 3 (11.26) 


Tables (cf. Table I at the end of the book) have been compiled for 
the probability density w of such a standard law. We note that 
equation (11.25) can be reduced to the standard form (11.26) by 
means of the substitution 


W— sy, 


x— x (11.27) 


Gg 


—— 


Let us investigate the shape of the curve of a normal distribu- 
tion (11.25), frequently known as Gauss’ curve. 

The following are the more elementary results of an analysis 
of this curve. 

The entire curve is located on one side of the x -axis (which is 
its asymptote) and is symmetric about the center of distribution 
1 
6 V 2n 
the distribution x= x and it has two inflection points at x =x+to. 

From formulas (11.27) Gauss’ curve (11.25) with arbitrary o 
can be obtained from the standard curve (11.26) corresponding to 


x==x. The curve has a maximum equal to _at the center of 


Continuous Random Variables 175 


s— 1 and x—0 by dividing the ordinates by s, decreasing the 
abscissas by an amount x, and then multiplying them by «. Thus, 
for o>1, we obtain a curve that is more extended along the 
horizontal axis with a lower maximum ordinate than is the case 
with a standard curve. For o< 1, the maximumordinate is greater 
than for s= 1. Figure 8 shows Gauss’ curves for 3 = 1/2, s=1, 
and s= 2 (the scale being the same for all three curves). 


ol6 o =1/2 
0.3 
0 1 2 
o=1 
0 1 2 3 
0.3 
o=2 
0 1 2 3 4 


Fig, 8, Curves of a normal distribution for various values of the 
variance o%, 


58. A FUNCTION OF A NORMAL DISTRIBUTION. 
CALCULATION OF PROBABILITIES 


As was noted at the beginning of the preceding chapter, the basic 
problem in the study of a continuous random variable is the 
calculation of the probability 


P(a<X < ). 


This problem is usually solved by means of formula (11.3) in con- 
nection with the distribution function F (x). 

Let us find this function for anormallaw. Since F(x)is a primi-~ 
tive with respect to the probability density p(x) and F(—o)= 0, 


(t-2) 


F(x) = f p@at= fe st, (11.28) 


a Vn 


-- © 


176 Mathematical Analysis of Observations 


Because of the symmetry of the normal law about the center of 
distribution, F (x)= 1/2. Therefore, 


If, for a standard normal law, we set x= 0, ¢= 1, and «=z, we 
have 


2 


f2 
_ i 1 — > 
Pa=t+ oe fe dt. (11.29) 


The second term on the right is usually denoted by ®(z) andis 
called the probability integral. In this notation, 


F (2) = 5+ (2). 


If in the expression for F(x), we make the change of variable 
t—x/s =z, we get 


F (x)= 5 +0(2—). (11.30) 


g 


Thus, by means of the probability integral (z), it is easy to find 
the distribution function for a normal law with given values of x 
and os. AS was mentioned in Section 48, detailed tables (cf. Table 
III at the end of the book) have been compiled for the function ® (z).* 

According to formulas (11.3) and (11.30), the probability that 
the random variable will take some value in the interval (a, 8) is 
expressed by the formula 


Pa<X<p=0(2-*)_ (2-4). (11.31) 


if, instead of the random variable X,weconsider its deviation from 
its mean value U =X — x, this formula takes the form 


P(A<U<B)=0(=)—o(4), (11.32) 


“Sometimes, we are given tables either of 2@(z) or 


2 
f toe dt 
Vo 
0 


instead of the function ®(z). By a proper substitution, one of these functions can easily 
be converted into the other, 


Continuous Random Variables 177 


where 1=a—x and B=—§-~-x are the end points of the interval 
containing those values of the variable U’ for which the probability 
is being sought. 

We can put formula (11.32) in a simpler form if the interval 
(4A, 8) 1s symmetric about the center of distribution. As was shown 
in Section 44, the function ~(z) is an odd function. Therefore, it 
follows from (11.32) that when B>0 and .i=— B, 


P(\U| < B) = 20(=), (11.33) 


Let us now see what the probabilities are for various deviations 
from the theoretical mean value in the case of a random variable 
that obeys a normal law. 

By use of i:ormula (11.33) we calculate the probability that U 
will not exceed ; in absolute value: 


P(|Uj <0) = 20(1). 


According to the tables, ®(1)= 0.3413. This means that 


P(|U| <3) = 0.6826 w+, (11.34) 


If we assume (in the case of a large number of observations) that 
the probability will differ only slightly from the relative frequency, 
we may say that if the distribution is close to the normal distribu- 
tion the absolute value of the deviation will in two thirds of all 
cases be no greater than «. This assertion is often called the 
sigma rule. In the theory of errors, 3 is called the mean error, 
and in statistics it is called the mean square deviation or simply 
the mean deviation. The name standard deviation is also used for 


— 


Ye 


Let us now derive the ‘‘three-sigma’’ rule. We calculate the 
probability of L/ not exceeding 3:3 in absolute value: 


P(|U| < 33) = 20 (3) = 0.9973. (11.35) 


From this, we have the 3-sigma rule: ifthe distribution is close to 
a normal distribution, it is very unlikely that the deviation will ex- 
ceed 33 in absolute value. 

In a Similiar manner, we may show that 


P(|U| < 42) = 0.9994; P(|U| < 2s) 0,95. 


The probable deviation is defined as that number , such that the 
probability of U not exceeding r in absolute value is equal to 1/2. 
By definition, 


P([U, <r) 220.5, 


178 Mathematical Analysis of Observations 


but, 
P(|U| <r) =20(2), 


and therefore, ®(7r/«) = 0.25. 
From the tables, we obtain 


” —=0.6745 or res, 


6 “~~ 3 


If we replace the probability with the relative frequency, we may 
say that in approximately half of all cases |U| will be less than the 
probable deviation if the number of observations isgreat. It should 
be emphasized again that conclusions of this nature are derived 
for theoretical distributions. They can be carried over to observed 
distributions only when the observed distribution differs only 
slightly from the normal] distribution. 


59. THE MOMENTS OF A NORMAL DISTRIBUTION 


If we write the equation of the normal distribution in the form (11,22), we obtain the inital 
firsteorder moment 


(7 —a)? 


It was shown in Section 57 that 
yeas xX. (11,36) 


We may say that the normal distribution written in the form (11,25) is the distribution of 
the deviations of the random variable from its theoretical mean value, Therefore, the 
central moments of the normal distribution can be obtained by using the following form 
of the probability density: 


uw 


p(u) = —o 


From this, we find 


but {2 is the variance and it was shown in Section $1 that the variance is equal to o?, 
Consequently, 


bey = 93, (11,37) 


Furthermore, we have 


] ar 
[ba = _ f ure ® dy = 0, (11,38) 


Continuous Random Variables 179 


because the integrand is an odd function of uv, If we integrate by parts, we can easily 
show that 


b= 334 = 3.5. (11,39) 


The expressions for », and uy can be used as a preliminary criterion in determining 
whether the observed distribution may be considered as an approximately normal dis- 
tribution, The answer is negative if u; differs greatly from 0 and/or wu, differs greatly 
from 3 34,* 

We can easily obtain a general expression for an arbitrary-order central moment of 
a normal distribution, A moment of arbitrary odd order is equal to 0 since the integrand 
in the expression 


(oe) 
l f 9k+1. 2 
= — u é du 11,40 


is an odd function of u and the integral is taken from —oo to +-co, For an arbitrary 
even-order moment, we can derive a recursion formula, By definition, 


; “ ; 1? ° u? 
~ 952 of ~; U "9g? 
Lo. = = f yek e 2 ay = — f rk | — e 3 ay. 
2k V Qn 3 V 2n o 
—-a@ —-—® 


If we use the formula for integrating by parts 


b 5 b 
f wdv=wu| — f vu dw, 
a a a 
setung 
a 
2k—1 ue 8a 
WU , dv= = é du 
so that 
we 
u=—e *, dw = (2k —1)u*-? au 
we obtain 
; w 100 p ww 
2k-1, 307 a 2k-2, 207 
Mop _ u é _ + —— f (2k —l)u é u. 
V 2z co Yr _ 
The first term vanishes for u =—oo andu = -+., which is easy to show by applying 


l’Hospital’s rule & times, The second term can be transformed as follows: 
[a e) u? 
¢ 2k-2, 20 
f (2k —1)u e du = 


Von 


—- © 


w 


foe) 
f yrk-2, 23" du = (2k — 1) ee 
a 


1 
= (2k — 1) 0? ——— 
0g 


Thus, 
Hop, = (2k — 1) 0%), = (2K — 1) (2k — 3) 04H.) 4g ees 


*A method of calculating the moments of an observed distribution will be taken up in 
Part V, 


180 Mathematical Analysis of Observations 


therefore, 
bo, = (2k — 1) (2k — 3)... 3+ 1+ (a2) pg, (11,41) 
or 
Mop = 163-5... (28 — 3) (2R— 1) o™, (11,42) 
since zo = 1, In particular for k = 2, 3, 4, 
p= » 3n5 = 34; 
w = 1-3. 5p3 = 153%, (11,43) 


py = 1-3-5 + Tuy = 10507. | 


60. DISTRIBUTIONS OTHER THAN THE NORMAL 


If values of a random variable are obtained under conditions 
satisfying the Lyapunov theorem, the normal law must satis- 
factorily represent the results of observations of the random 
variable. The question of comparing the theoretical distribution 
with the empirical will be examined in detail in Part V of this 
book. Here, we mention one of the methods of comparison: (a) 
Numerical values of the parameters of the chosen theoretical law 
of distribution are selected; (b) the values ofthe theoretical distri- 
bution function are determined for a number of chosen values of 
the variable; (c) the corresponding empirical probabilities (that is, 
the ratios of the numbers of observed values not exceeding the 
values chosen in (b) to the number of all observations) are cal- 
culated. Comparison of the theoretical and empirical probabilities 
makes possible a determination of the degree of agreement of the 
theory with observations. 

The study of a large number of empirical distributions of dif- 
ferent variables has shown that the normal law does not always 
satisfactorily represent observations. In addition to variables that 
are not sufficiently close to the normal law, there are variables 
that, because of their physical properties, cannot obey a normal 
law. 

For example, let us consider the distribution of stellar paral- 
laxes. For two obvious reasons, this distribution cannot obey a 
normal law. In the first place, the normal law is defined for the 
entire real axis, that is, from —co to +o, whereas parallax is 
a positive quantity, and, consequently, the curve of distribution 
must be bounded on the left. Since, in the case of stars, there can 
in practice be no large parallaxes, the distribution curve is also 
bounded on the right. In the second place, the number of stars 
increases with decrease in parallax, so that the maximum proba- 
bility density will occur with a parallax value equal to 0. Since 
stars with large parallax are rare, the curve must approach the 
x~axis asymptotically with increase in parallax. The result is 
that the distribution curve must have a shape somewhat like the 
letter J; that is, it must differ quite sharply from a Gaussian 
curve, 


Continuous Random Variables 181 


A second example is the distribution of absolute values of 
velocities of the members of some set of moving bodies, for ex- 
ample, of asteroids. Again, this quantity assumes only nonnegative 
values, and therefore, the distribution curve is bounded on the left. 
Since infinite velocities cannot have physical meaning, the curve 
must also be bounded on the right, although it is considerably more 
difficult in this case to state where the bound lies. 

From what has been said, it follows that both in solving the 
basic problem and in describing random variables, we may not 
confine ourselves just to a normal law. We must construct other 
theoretical distributions by using the observed probability patterns 
or other theoretical considerations. We shall briefly consider 
certain kinds of distribution curves that arise in practice. 


1. Type-A Charlier Curve. 


The form of a function representing a probability distribution in 
the case in which the normal law is not suitable can be chosen 
from the following considerations. If the normal law 


is not satisfactory, the probability density can be written in the 
form 


9 (x) = p(x) I(x), 


where II (x) is a polynomial of degree no higher than four. Charlier 
called it the perturbational polynomial,* Let us replace the values 
of x with their deviations u from the mean value x. Then, 


¢ (u) = p (u) TI (a). 


To determine the five coefficients of a fourth-degree polynomial, 
we may obtain five equations by determining the moments from 
zeroth to fourth order inclusively. 

In the general case, this system has one solution. The coef- 
ficients of the polynomial I(x) are expressed in terms of the 
central moments of order 2, 3, and 4. Instead of the third- and 
fourth-order moments, we introduce the asymmetry A and the 
excess E, defined by 


Aa £3 (11.44) 


o8 ' 


*The terminology is borrowed from celestial rmechanics, lt means that the poly-— 
nomial II (x) changes p(x) in such a way that a probability density will be obtained that 
represents the observations more satisfactorily, 


182 Mathematical Analysis of Observations 


E=—% _3, (11.45) 


od 


It follows from Section 59 that, for a normal law, A=E=0. 
Therefore, we may assume that A and F characterize the deviation 
of the distribution from the normal. The coefficients 4, a,, a,, a;, 
and a, of the polynomial II(u) can be expressed in terms of A, E, 
and sc: 


] 1 A 1 E 
MIP Ze a= a=—-Ge 
_1iaA _1E 
3B a SS DE aH" 


If we choose the terms in the polynomial P(u) with asymmetry and 
excess separately, it turns out that the coefficient of A is equal to 
the product obtained by multiplying the third derivative of the 
normal density p(x) by — 03/6 and that the coefficient of F is 
equal to the product of the fourth derivative of TI (ujand 3*/24. 

The probability density of Charlier’s law then takes the form 


9 (u) = p(w) —  Asdp! (u) 4 5p Baty! (u). (11.46) 


It is clear from this equation that, for small values of A and F, the 
terms containing asymmetry and excess are also smalland, there- 
fore, the main term in the expression above is the probability 
density of the normal distribution. 

Tables are compiled for the functions p!'(u) and p!'‘(u) for 
o= 1. By means of these tables, it is easy to calculate (uu). 

The solution of the basic problem is obtained quite simply: 


(2 B 3 
Pa<U <H=f pwdu—z ao f pi (u) du +3 Bet | pl (u) du. 


If we set >= 1, that is, ifweevaluate wu taking >= 1, the integration 
becomes quite simple: 


P(a<U <8) =) — @@) —F Alp" @) —p" (@)] + 
+ 97 Elp™ @) —p™ @)L. 
Instead of p(w, we may write ®"!' (4) and analogously, we may 


write ©!’ (uw) instead of p''(z). Tables have been compiled for the 
functions ®!! (v7) and O!¥ (1). 


2. Pearson's Curves. 


Pearson obtained these distribution curves, whose equations he 
derived, from an approximative study ofa more general probability- 
theory problem than that of repeated trials for different values of 


Continuous Random Variables 183 


p and g. The simplified derivation of the normal distribution can 
be reduced to a special case of the more general problem. A 
number of authors have pointed out that Pearson’s curves can be 
obtained by formally generalizing the same differential equation as 
that obtained for a Gaussian curve. 

In the derivation of the normal law(see Section 56), the equation 


contains a constant term in the denominator of the fraction on the 
right. We shall obtain curves of a more general form if instead of 
this constant, we have a function of uw. If we assume that this 
function can be expanded in a Maclaurin series and if we keep only 
the first three terms, we obtain the differential equations for 
Pearson’s curves: 


dy _—s_— (x — a) dx 
Ty BF Oye Fb gxt" (11.47) 


This equation contains four parameters: a, 4), 6, and 6,. Whenthe 
equation is integrated, there will be yet another arbitrary constant, 
but it will be expressed in terms of these four parameters. For, 
from the normalization condition, the area underneath the curve 
must be equal to unity. This condition gives an equation for 
determining the arbitrary constant just as in the case of a normal 
distribution. 

Integration of equation (11.47) gives a whole series of distribu- 
tion functions (including functions with U-shaped and J-shaped 
curves), which can be used for making interpolations with observa- 
tional data. Tables have been compiled to make the use of Pearson’s 
curves easier. One can familiarize himself with Pearson’s curves 
from the book Krivye vaspredeleniya i postroenie dlya nikh 
interpolyatstonnykh formul po sposobam Pirsona i Brunsa (Dis- 
tribution Curves and the Construction of Interpolational Formulas 
for them by the Methods of Pearson and Bruns) by L. K. Lakhtin, 
Moscow, 1922. As an example, we give the equation for a type- 
three Pearson curve that finds application in certain problems: 


y= ye (1 +3] (11.48) 
where 
a m= — | 5 =f 
Ta = Pe] ~ 8B yori pe! 


m+1 ym, p-ly 


Yo =" Tim +1) 


184 Mathematical Analysis of Observations 


Here, 


oO 


D(m+ 1) = f xme-# dx 


0 


is Euler’s integral of the second kind, which cannot be solved in 
terms of elementary functions unless m is a nonnegative integer. 
Tables have also been compiled of this integral. 

In addition to Pearson’s curves, certain other distribution curves 
of a particular form have been constructed. However, it should be 
mentioned that in practice investigators prefer not to choose 
functions with more than four parameters. If the number of 
parameters were in excess of four, one would need to use cal-~ 
culated moments of higher than fourth order in setting up the 
equations for determining the parameters. Moments ofhighorders, 
as can easily be seen from the definition and the method of com- 
puting them, depend to a great extent on the ‘‘boundaries’’ of the 
distribution, which may be very unindicative of the distribution. 
For the same reason, only the first three terms are taken in the 
denominator in the differential equation of Pearson’s curves. 

It should be especially noted that cases are frequently en- 
countered in which an empirical distribution curve has two 
maxima. For such distributions, even the mean value of the vari-~ 
able should not be considered as characteristic. In a number of 
such cases, we may assume that the statistical material repre- 
sents the sum of two distributions, each of which is close to a 
normal distribution. In the case of such distributions, the question 
of decomposing them into two normal distributions has been 
worked out. 


3. Maxwellian Distribution. 


A Maxwellian distribution is a distribution of absolute values of 
velocities of molecules, point masses, or other particles that is 
obtained under the assumption that the components ofthe velocities 
have the same variances along each of the (rectangular) coordinate 
axes. It may be shown that the probability density of a Maxwellian 
distribution is of the form 


v 


3 


PO) = ri , 


(11.49) 


where v is the absolute value of the velocity and «, is the single 
distribution parameter representing the mean square deviation 
along each of the coordinate axes. We state without proof the basic 
numerical characteristics of this distribution: 

The mean value 


400 oe 16 -— | wag | 
V on Sg = 1.59695; p(v) a an é Rn >— 0.069 - —$5 


Continuous Random Variables 185 


the mode 
Um =% V2 1.4149; P (Um) = Ta + 0,588 “a 
eV Qn So %0 


the mean square deviation 


o= V3 —= Bg & 0.67469 = £4 
the distribution function 
— _ UY) 9g Yo (& 
F(v) =P (V < v) = 20( =) 9 o ( ). 


So So 


Values of the distribution function are given in the table 


We also give the results of the computation of several proba- 
bilities: 
P(V < Um) = 0-427, 
P(V <v)=0.535, 
P(u—o <V <v+5)=0.677, 
P(u—% <V < u-+ 20) = 0.995. 


The last two probabilities differ only slightly from the analogous 
probabilities for a normal law. If we construct a normal distribu- 
tion with center vw and with mean square deviation ;, then for 
3,==1, the ordinate of the distribution curves will not differ by an 
amount greater than 0.08. 


4. Student's Distribution. 


A similar distribution is used for estimating the probability of 
deviations of sample means of a general average set that obeys a 
normal distribution law. V. I. Romanovskii used Student’s dis-- 
tribution in the theory of errors for problems involving a small 
number of observations. The probability density in Student’s dis- 
tribution is of the form 


po=cm(t+ooy) *, (11.50) 


186 Mathematical Analysis of Observations 


where n is the number of objects in the sample (in particular, the 
number of observations). The remaining quantities are determined 
by the formulas 


cu Att oe tn 
a Pn 


n 
7 __y)\3 
_ >) (xk x) 5 
oo k=1 & ’ oC —y; 
n— | Vn 
xX -~ Xo 


Here, C depends only on 2; x«,, x.,...,%*, are sample values of the 
variable; x, ig the mean value in the entire (general) set. It is 
sensible to use Student’s distribution when the number of observa- 
tions does not exceed twenty since, for »= 20, the distribution 
varies only slightly from a normal distribution. 


Chapter 12 


JOINT PROBABILITY DISTRIBUTION 
OF TWO CONTINUOUS 
RANDOM VARIABLES 


61. THE JOINT PROBABILITY DENSITY 
OF TWO VARIABLES 


Let us consider two continuous random variables Xand Y together. 
The basic problem for us is to compute the probability that . will 
assume a value in a given interval(x,, «,)and Y will assume a value 
in an interval(y,, y.), that is, that 


Xr, <OX Sx VW CV <p. 


If these events (that is satisfaction of these inequalities) are inde- 
pendent, the random variables are said to be independent and the 
probability that we are interested in will be the product of the proba- 
bilities of the two inequalities: 


fx <x< Xo } 
P — Pp xX Pi(yv Y 5). 
lLu<V¥ <n Jf (OTOH) (MOY Sy) 


If the probability that one of these variables will take a value in an 
arbitrary interval depends on the value of the other variable, these 
variables are said to be correlated (or we may say that there is a 
correlation between X and Y). The correlation is completely de- 
termined from the point of view of probability theory if we know the 
law determining the probability of the simultaneous satisfaction of 
the above inequalities for arbitrary valuesofx,,x,, y,, and y,. Such 
a law may be either a function of the joint distribution of the two 
random variables or it may be the probability density. We can in- 
troduce the concept of probability density in the following manner: 
Suppose that we know the probability 


Zepepenne 
Y<¥<y+Ay 


187 


188 Mathematical Analysis of Observations 


that Y will assume a value in the interval from x to x -+Ax and 
that Y will simultaneously take a value from y to y+Ay (for 
Ax > 0 and Ay> 0). In general, this probability depends on x, y, 
Ax, and Ay, The number 


pss 


y<¥Y<y+Ay 
Ax Ay 


(12.1) 


may be called the mean probability density inthe intervals (x, x -+-Ax) 
and(y, y+Ay). Let us find 


| 


; A 

lim 2S Vey ay) | 
Az >0 x Oy 
sy >o0 


(12.2) 


If this limit exists, it will, in general, be a function of x and y. We 
denote it by f(x, y) and call it the joint probability density of two 
random variables.* From the definition, we have the approximate 
equality: 


x<X<x+Ax 


=f(x, y) Ax Avy, . 
yeYey+ay f(x, y) Ax A} (12,3) 
if Ax and Ay are sufficiently small. 

If the probability density of a set of two random variables is 
given, the probability that they will take values in the intervals 
from x, to x, and from y, toy, is determined from the obvious exact 
formula representing a generalization of the analogous formula for 
the one-dimensional problem: 


Yo Dy 


x < xX< Xo - 
p —_ f(x, yyax dy. 12.4 
n <¥ < yo JJ 


We might pose the more general problem of the probability that 
the variables will take values in some two-dimensional region S, 
This problem is solved by the formula 


P 


x 
yOp=f free y)dx dy. (12.5) 


Suppose that all combinations of values of the random variables ¥ 
and Y are contained in a two-dimensional region *. Since our 
variables will certainly take some value or other in this region °, 
we have, from the last formula, 


f fre. maxay=t. (12.6) 


L 


*This function is also called the differential distribution function. 


Distribution of Two Continuous Random Variables 189 


This formula expresses the basic property of a two-dimensional 
probability density, which, in analogy with the corresponding 
property in the case of the one-dimensional problem, we may call 
the Condition of normalization of probability density. 

It follows from the definition of probability density that f(x, y) = 0 
outside the region L.. Therefore, the condition of normalization can 
always be written in the form 


+co +0c0O 


f f fe. y) dx dy=1. (12.7) 


-oO -c;w 


We can determine the theoretical mean value (the expectation) 
for each of the variables y and Y from the following formulas: 


+00 +co 


x =E(X)= f ffx, y) dx dy, (12.8) 
_ +c +0 
y=E(V)= f f yf (x, y)dx dy. (12.9) 


The point with coordinates (x, y) is calledthe center of distribution. 
A generalization of these concepts is the concept of the expectation 
of the function ¥(X, Y) of the variables in question: 


+0 +0 
EX, Y= f fo EX VF wax ay. (12.10) 


- CO —-& 


The reader can independently define the initial and central moments 
of various orders for a set of two variables. 


62. CONDITIONAL PROBABILITY DENSITY 


In this section, we shall study the relationship between the joint 
probability density f(x. y) of the two variables xX and Y and the 
probability densities /,(«) and f,(y) of the two individual variables. 

This relationship is found most simply for independent vari- 
ables. For if X and Y are independent, then, for arbitrary Ax > 0 
and Ay> 0, 


jx < Sy Lay t=? xoX ox tdxyPiy < 

= VCUY cy+dy). 
ly<VY<ytAy < 
If we divide this equation by the product Ax Ay and take the limit 
as Ax and Ay approach 0, we obtain the joint probability density of 
X and Y on the left side andthe product of the individual probability 
densities of the variables X and Y on the right: 


fixe N=f MAO). (12.11) 


190 Mathematical Analysis of Observations 


Let us turn now to two correlated variables. From the theorem 
on multiplication of probabilities, we have the exact equations 


Psy ota, _ 
Y<V <y+Ay 
—=P(x <X<x+tAx)P(ycV¥ <ytAylxe << X<x+Ax), 


in which the second factor on the rightis the conditional probability 
that the variable Y will take avaluein the interval from y to y+ Ay 
if it is known that the variable X takes a value from x to x+ Ax. If 
we divide both sides of this equation by 4x Ay and take the limit as 
Ax and Ay approach 0, the left side will give the joint probability 
density f(x, y) of the two variables X and Y. The first factor on the 
right, when divided by Ax, will, in the limit, give the probability 
density f,(x) of the variable xX. The second factor on the right, 
when divided by Ay will give a function of y containing « asa 
parameter and not containing 4x orAy. We denote this function by 
o(y|x) and call it the conditional probability density of the variable 
Y for a given value of x. This conditional probability is formally 
defined by the equation 


, Piy<Ye< Ay|x< X<xtAx 
Ax» 0, Ay > 0 y 


If we take the limit in the expressionfor the probability, we get the 
equation 


f(x, Y= fi (4) + 92 (y |). (12.12) 


In an analogous fashion, we obtain the equation 
fie, N=h(y): 941»), (12.13) 


in which 9», (x| y) is the conditional probability density of the random 
variable X given that Y takes the value y. This function is deter- 
mined by the formula 


Pix<Xcxtidxly<VY<y+taAy) 


P(x ly = lim Ax 


4x >0d, Ay >d 


From these relationships between the conditional and unconditional 
probability densities it follows that only three of the five can be 
given to some degree arbitrarily. The other two can be calculated 
from the formulas that we have obtained. Also, our choice is 
limited by the normalization conditions of the five probability 
densities: 


f fre yydxdy =, [nwacan: f horay=: 


faGIyde=i f evldayat. 


— co 


Distribution of Two Continuous Random Variables 191 
Equations (12.12) and (12.13) make it possible toobtain a simple 


differential relationship between the two conditional probability 
densities. From these equations, we have 


fi (*) G2 | x) = fo (Y) o1 (x |). (12,14) 
If we take the logarithms of both sides of this equation, we obtain 
In f, (x) + In 9 (y | x) = In f,(y) + In 9, (x | Y). 
Let us differentiate this equation once with respect to « and once 


with respect to y. The derivatives of the first terms on each side 
of this equation are equal to 0Oandwe have the desired relationship: 


63 {In g(x] y)) __ 0? [in 99 (y] x)] 
dxdy — Oxdy (12.15) 


It follows from equations (12.12) and(12.13) that the joint probability 
density of two random variables is afunctionin which the probability 
density of each of the individual variables occurs as a separate 
factor. Consequently, we can write f(x, y) in the following form: 


f% NH=AOhRONGE& Y). (12.16) 


It follows from (12.11) and (12.16) thatthe function o (x, y) is identi- 
cally equal to unity in the case of independent variables. It is not 
difficult to show that the converse is also true. If the variables X 
and Y are mutually independent, then, for arbitrary values of 
X,;, X, y, and y., we may write for a rectangular region: 


x, KX <x, ! Yo ZX, 
eye =/ J he) fdxdy = 
Ye 2, (12.17) 
=f Acyay [ f, (x) dx, 
My a, 
or 
Set 
= xX .?P Y 
leven P(x, X <p) POW SE SI) 


This last equation shows that our variables are independent. 
Thus, the variables X and Y are mutually independent if and 
only if 


O(x, y) =. (12.18) 


Consequently, the nature of the correlation is completely given by 
the function ¢(x, 4). 


192 Mathematical Analysis of Observations 


One of the characteristics of the conditional distribution of X 
for a given value of Y = y is the conditional expectation (conditional 
mean value) given by the formula 


+ 0O 


min= f xe.(xl vax. (12.19) 


— € 


The other conditional mean value is given bythe analogous formula 


+ oo 


ma(x)= f yon (y!x)dy. (12.20) 


— co 


The lines in the xy-plane corresponding to the equations x = m, (y) 
and y=m,(x) are called the lines of regression and their equations 
are called the equations of regression, _ 

From the definitions of the joint mean values of x and y and the 
conditional mean values m,(y) and m,(x), we have the relations 


+00 
x= f form cnay. | 


- & 


a (12.21) 
y= fo fC) my (x) dx. | 


— & 


63. THE NORMAL DISTRIBUTION OF TWO 
RANDOM VARIABLES 


The distribution of two random variables is said to be normal if 
each of them obeys Gauss’ law when the value of the other lies 
within an arbitrarily small interval. This definition assumes that 
the functions ¢,(x/}y) and 9,(y!lx), which we called the conditional 
probability densities of the variables X and Y, are densities of 
normal distributions. Since the quantity y is a parameter for the 
function 9,(x|y), it is natural to assume in the general case that 
the mean value and the mean square deviation, which appear in the 
density of a normal distribution, depend on y. Therefore, the 
function ¢, («| y) must take the form 


—_ [x — my, (y)]? (12.22) 


1 , 
9, (x | y) = VEaw | 2s*(y) 


Here, m,(y) is the conditional mean value (expectation) of the vari- 
able X for given y and s,(y)isthe conditional mean square deviation 
of X. For the same reasons, we may write 


(y — mo(x)]? 


l 
9. (y |x) = Via {- Bin (12.23) 


Qn Sy 


where m, and s, have the same meanings for the variable Y as m, 
and s, have for xX. 


Distribution of Two Continuous Random Variables 193 


Let us use equation (12.15) to determine the dependence between 
the functions s,(y), s,(x), m,(y)and m,(x). This relationship will make 
possible the determination of the form of these functions. We intro- 
duce the notation 


k,{v) = ——, kR, = TT ° 
5) (y) 2) Sy («) (12.24) 
Then, 
Ina (x] y)== — in 2k tin VR, (Cy) — 5 R, (yx — mm, Cy), 
In s,(x] y) == — In V On + In VR, (x) 5 hy (xc) [Vv — my (x). 


If we differentiate once with respect to x and once with respect to 
y, the first two terms on the right sides of these equations vanish. 
Therefore, according to (12.15), 


xk (vy) — m1 (9) Ry (Y) — m1, (Y) A ) = 


— yy (x) — ms (x) Ry (x) — my (x) Ra (x). (12,25) 


We now differentiate equation (12.25) once with respect to y and 
once with respect to x. Since the second and third terms on the 
left depend only on y, their derivatives with respect to « are equal 
to 0. For the same reason, the derivatives of the second and third 
terms on the right are equal to 0. As a result, we have 


ky (y) = ka (x). 


Since x and y are not functionally related, this identity is possible 
only when 


Ry (y)=2C and ky (x) = 2C, 


where C is an arbitrary constant. If we twice integrate each of 
these equations, we obtain 


ki(yy=CyV+ayt+h, 
k,(y) = Cx? + dgx + fa, 


where d,, a>, f,,and f, are arbitrary constants resulting from the 
integration. The form that we have obtained for the function 4&4, (5) 
is such that z,(y) approaches >o as y approaches -o provided C and 
d, are not both equal to 0. Since, by (12.24), 


s3(y) 


A(y)= 


under these conditicas s,(y) approaches 0. 
O 


194 Mathematical Analysis of Observations 


The function s,(y) represents the mean square deviation of the 
normal distribution of X when Y=y, If s,(y) is very small for 
large values of y, this means that small deviations from the mean 
m,(y) are extremely unlikely. 

In most applications of the theory of correlation, a relationship 
of this nature is not of great interest since it would indicate 
stability (constancy almost) of the values of X for large values of 
Y. The same may be said about the function &,(x). Therefore, to 
exclude these special forms of distributions, we set 


so that 
A(YMH=h RO=f. 
Therefore, 


l ] 
R =a and Ry (x) = we 
1 


tw 


where 5s; and Se are now constants. If we substitute these values 
of &, and &, into equation (12.23), we obtain 


mi (9) my (2) (12.26) 


2 2 
S4 53 


Since x andy are not functionally related, this equality is possible 
only if each of the functions appearing on the two sides of (12.24) is 
a constant, that is, only if 


/ 4 
m,(y) m, (x) 
7 =m, 7 = ™M, 
Sj Sy 


where m is an arbitrary constant. If we integrate these two 
equations, we obtain 


m (vy) mseyt py, my (x) = ms?x + po, (12.27) 
where p, and p, are arbitrary constants resulting from the integra~ 


tion. If we substitute these values of m,(y) and m,(x) into the ex- 
pressions (12,22) and (12.23) for the conditional densities, we obtain 


! [x —(msiy + Pr)]? (12.28) 
ei (x | y= 5 Von exp] — A= 252 ’ 

| [y — (ms, + ps) 
P2 WI) = ae exp ) — os . (12,29) 


The quantity msjy-+p, in equation (12.28) is the theoretical 
mean value of the variable X corresponding to the given value of 


Distribution of Two Continuous Random Variables 195 


y and the quantity ms3x-+-p, in equation (12.29) is the theoretical 
mean value of Y corresponding to the given value of x. The mean 
square deviations s, and s, of the Conditional distributions are then 
constants. 

Thus, in the case of a normal correlation, the mean of those 
values of each variable which correspond to a definite value of the 
other variable is a linear function ofthat value. In other words, the 
lines of regression are straight. The equations corresponding to 
these lines are of the form 


x= msiyt+p, 


12.30 
y = ms3x + po. ( 30) 


The coefficients ms? and ms}in these equations are called the coef- 
ficients of regression. Let us denote by x and y the coordinates of 
the point of intersection of the lines of regression. Obviously, 


_ - 
py=x— ms; y, 

_ - 
Do == V— MS3Xx. 


Therefore, the conditional probability densities and the equations 
of regression can be written in the form 


a(x {y= ex {_[ien sync ot | 
71 y — Sy on Pp 28° ’ 


| yy ne AT}, 
exp) — 7” 


28s 


1 
Go (y IN=> V 2 


x— x= ms;(y— ¥), 


y— y= ms3 (x — x). 


64. THE PROBABILITY DENSITY OF A 
NORMAL DISTRIBUTION 


To derive a formula for the probability density of a normal] distri- 
bution, we move the coordinate origin to the point of intersection 
(x, y) of the lines of regression. 

Then, the equations of the lines of regression for the new (dis- 


placed) random variables U and V are of the form 
(12.31) 


3 
u == MSs\V, | 


2 
U == MSgll, 


and the conditional probability densities are written in the form 


(u |) 1 u* — 2ms? uv + m’ stv? 
91 (u |v) = —=— exp) ———_, 
Von 54 (12.32) 
(u\ 0) i vu? — 2ms, uv + m?ssu° | 
“4)VU)= — €X — —_———___ _——. —_ }, 
Ye So V On P 2s? 


196 Mathematical Analysis of Observations 


If we substitute these values of 9, and », into equation (12.14), we 
obtain 


Ff ,(4) Ee “x0 — at + muv — 5 


] u? syu 
= U) ————— eX —_ mudv—— 
fa( ) 51 V 2 p | 2s? + ») ’ 


where f,(u) and f,(v) are aS yet unknown probability densities of the 


distribution of the two individual variables. 
Let us now cancel e”™ and let us rewrite the equation so that 


on one side we have factors depending only on 4 and on the other 


side factors depending only on v: 


st 


= sh (v) exp |— A x ( msi —3] ac 
$3 


i(os-a)]> 


= hi(uexp |— 


Since w and v are not functionally related, this equation is possible 
only if 


1 1 a 1 
3/1 (u) exp |— z (ms — e | = C, 
(12.33) 


f, (4) = Cs, exp |- + (7 —m*ss] a 
I ! (12.34) 
f,(v) =C's, exp |—a(q- ns) a . 
S» 


From the forms of the functions f,(u) and /,(v), we Conclude that in 
the case of a normal correlation the distribution of each variable 


obeys a normal law separately. 
Let us write these probability densities in the usual form 


f, (ua) = — é , 
ou V 25 (12.35) 


Distribution of Two Continuous Random Variables 197 


If we equate these formulas with the preceding ones, we obtain 


2 ») 
2 53 ; 5) 
92 = ge (12.36) 


202,48 
1—m S185 
Let us define 


R? = m?s?s?, (12.37) 
Then, 


If we substitute the expressions (12.34) and (12.32) that we have 
found for the functions f,(u) and ¢,(v| uz) into equation (12.12) with x 
and y replaced by u and v, we obtain the probability density of a 
normal distribution of the variables U and V: 

f(a, v) = te 
V2nV — R24 
wu? “ueou U 


] 3 
xe) arses [eRe t ef 


u 


(12.38) 


The equations of regression with the new variables (R instead of 
m and ¢,, so, instead of s,, s,) are of the form 


u=R—“v, v=R“u, (12.39) 


Sy. g 


Let us now find the center ofanormal distribution. In the pres- 
ent case, 


m,(v)=R m,(u) =R— u, 


Therefore, from formulas (12.21), 


to _ R +o _w 
— g _ Gg 
u = ve 234, dv, v= Va ue 294, du 
Ox Tt 
— Cc ~ ©w 


The integrals on the right sides of these equations are obviously 
equal to 0 and, consequently, 


u—0, v= 0. (12.40) 


Remembering that our coordinate origin is situated at the point of 
intersection of the lines of regression, we conclude that in the case 


198 Mathematical Analysis of Observations 


of anormal distribution of two random variables, the point of in- 
tersection of the lines of regression coincides with the center of 
distribution, * 

To find a ‘‘probabilistic expression’’ for the variable R, let us 
calculate the expectation of the product of the random variables U 
and V. If we substitute the value of f(u, v) given by equation (12.38) 
into the equation 


+00 +0 
E(UV)= { J uvf(u, v)du dv, 
we obtain 
EUV) = 
re) wo c, \2 
= f—ie ™ f 2 x | a) | wea 
Daun VI RV LR : 


To evaluate the inner integral, we make the substitution 


v— R—u 
ou, 
6,V 1— R? 
Then, 
Ru 
du=06,V 1—R dz 
Vv 5VI-R o,V1—R 


Here, u is considered a constant. 
The integral with respect to v takes the form 


ce j _ 
—=  @ { — 2 f ~Ey 
i V Qn Sy Vi- R dz R2 3, TE e dz. 


The first of these integrals vanishes since the integrand is an odd 
function of z. The second is equal to R( %/ >, )u because 


f Te e 2dz—|l. 


Consequently, 


E(UV) = R 2 f ote Mu du. 
U e G 


Since u = 0, the integral on the right isthe variance of the variable 
u, Which is equal to o?, Therefore, 


*We note that in the present case the coordinates of the center of disuribution coincide 
with the theoretical means of each of the two indididual variables, 


Distribution of Two Continuous Random Variables 199 
E(UV) = Ro,o,, 
or 


E(UV) 


BaSy 


R= (12.41) 
Let us go back tothe original variables x and y inthe equations 
that we obtained for the probability density (12.38), the regression 
equations (12.39), and the coefficient R(12.41). Since x and y were 
replaced by u and v by means of a displacement of the origin, 


u=X—X, v=y—y, 


Therefore, s,=s, and 5,=s,, The expression for the probability 
density therefore takes the form 


_ l _ 1 (x— xy 

fix, y) oa, VILE exp | arr | a 
- 7. 0-7 (12.42) 

— on *—% yam Yay 

Re a TOE 
The regression equations are then written 

— Sy _ — oP) = 

Y—YH=RITK—xX) x—xX=R—-(Y—Y), (12.43) 
? y 


where the y and x. on the left sides of these equations represent the 
mean values obtained in the case of the definite values of x and y 
given on the right sides. Therefore, the y and x that appear on the 


left are sometimes denoted by y, and x,. For the coefficient R, 
we obtain the expression 


a Elle DO) _ BUY) — 35 


(12.44) 


The product Rs, s, is called the covariance. If we denote by »,, and 
pry the coefficients of regression, we have 


g G 
bye =R, Poy = R (12.45) 
The quantities x, y, o,, and o, that appear in these formulas are the 
theoretical mean values and mean square deviations of each of the 
separate variables X and Y. Thepoint (x, y) isthecenter of distri- 
bution and at the same time the intersection point of the lines of 
regression. The significance of the quantity R from a probability 
standpoint is given by the following theorem: 

For two variables obeying anormal law to be mutually inde- 
pendent, it is necessary and sufficient that R = 0, 

Proof: We write f(x, y) in the form 


l (x—x)? | I ex || 
x, V= a —,-= exp ——__—— x 
I(% y) ' Von exP 20° Sy V 2n 20 


1 _ _R R(x — x) Ax—x) (y—y)_, RIV—yY 
x| rae or | | 2 TR Ij}. 


G 
Cy SH Oy y 


200 Mathematical Analysis of Observations 


In the preceding section (see formula (12.16)), the last factor was 
denoted by ¢(x, y) and it was shown that the function ¢(«, y) com- 
pletely determines the nature of the correlation; specifically, if 
the variables are independent, y(x, y) must be identically equal to 
unity and conversely. From the last formula, it is clear that in 
the case of a normal correlation, the last factor will be identically 
equal to unity only if R—0O. 

It follows from the expressions for s? and s? that |R|< 1. If 
|R|= 1, then s,—s,=—0. Since s, and s, are the mean square 
deviations of the conditional distributions, there corresponds in 
this case only one value of x (coinciding with the mean value x) to 
each value of y and conversely; that is, « and y are each single- 
valued functions of the other. For, as s,— 0, Gauss’ curve 
characterizing the distribution of X for a given constant value of 
Y=y tends to coincide with the ordinate; in other words, the 
probability of any deviation from the mean approaches 0. Since the 
theoretical lines of regression are straight, the uniquely deter- 
mined dependence between X and Y will be linear when |R|= 1. 

Thus, this number R characterizes the deviation ofthe correla- 
tion from a linear functional relationship. The number R is called 
the correlation coefficient. 

It follows from formulas (12.45) that 


Re = PyaPary 


Thus, the square of the correlation coefficient is equal to the 
product of the coefficients of regression. 

As we have noted, in the case of a normal distribution of two 
random variables, the mean value of each of these variables when 
the other is held constant is a linear function of the value at which 
the other variable is fixed. A correlation of this kind is called a 
linear correlation, On the basis of the linearity that we have shown 
for the equations of regression, we may say that the assumption of 
a linear correlation between two variables can be considered 
justified if there is a reason for assuming that the random vari- 
ables in question obey a two-dimensional normal law of distribution. 
When this is the case, the set of two random variables is com- 
pletely characterized by five numbers: the individual mean values 
of the two variables, their mean square deviations, and the 
correlation coefficient. 


Ellipses of Equal Probabilities. 


From the general form of the probability density of two random variables, it follows that 
the probability density is constant at all points of the x y=plane at which 


(©=4Y 5, = *)—¥) | ¥=IP Ly 
on xy oy 
where A is an arbitrary constant and r is the correlation coefficient, 
The curve in the xy-plane described by this equation is an ellipse, a fact that is 
easily verified by the methods of the theory of second-degree curves, Such an ellipse 
is called an ellipse of equal probability, since at every point on it the probabilities are 


Distribution of Two Continuous Random Variables 201 


equal for falling on equal elementary areas, We call such an ellipse a ) -ellipse and 
denote the region in the x y-plane encircled by it with the letter A; we denote the proba~ 
bility that a point will fail in the region by P()), 

For brevity in writing, we convert to the arguments u and v., that is, to the deviations 
of our variables from their expectations, The equation of a } -ellipse in the new variables 
takes the form 


“3 uv vu" 
a? — er boy ee 
u event oy 
Sa = Sr Sy = Sy: 


By the definition of probability density, we have 
Pay=f fp. v) du du, 
A 
where p(u, v) denotes the probability density, 
Let us use the polar coordinates 
u==pcos9,* v=psing. 


By a transformation of the integral, we obtain 


——ee eee 


sVi-r? 


Py 
Qn 
l — 
P(K =. [ dé f é do, 
) Qno,6,V1—r? : : pee 


where 


9 


oF G3, 6 6 


sta | SP cos @ sin 4 |: l 


Integrating with respect to p, we have 


pay) [1-7] f a 


223,,0,Vl—r 


To simplify the evaluation of the integral with respect to §, we assume that the ellipse 
encompasses the entire plane as 4 approaches oo, In other words, P(co)=1, On the 
other hand, from this formula we obtain 

2n 


l do 
2no,,6yV1—r? ; s 


2 
Therefore, P(A) =1—exp Dds * 


In particular, if the variables are mutually independent, r will be equal to 0, In this case, 
for example, 


P(2)= 0.865, P(3) = 0.989. 


The semi-axes of the corresponding ellipses are respectively equal to twice and three 
times the mean square deviation, Such ellipses may be called confidence limits in 
analogy with the concept of the one-dimensional problem and the corresponding proba- 
bilides may also be called confidence probabilities, 


Part IV 


FUNDAMENTALS OF THE THEORY OF 
RANDOM MEASUREMENT ERRORS 


(The method of least squares) 


Chapter 13 


GENERAL REMARKS ON 
MEASUREMENT ERRORS 


65. TYPES OF MEASUREMENT ERRORS 


All measurements contain errors of various origins. It is cus- 
tomary to classify these errors as: 

(1) systematic, 

(2) random, 

(3) personal, and 

(4) gross. 


I. Systematic errors 


The most important class of systematic errors is instrumental 
errors. The instruments used to make measurements cannot be 
constructed so as to be perfectly accurate. In the simplest case 
of a direct measurement made on a calibrated instrument, the 
Spaces between the dividing lines are somewhat wider or nar- 
rower than the nominal distance (for example, the space between 
dividing lineS may be 0.999 mm at one point on a ruler and 1.002 
mm at another point, etc.), Scale errors may appear when we 
measure the intervals with a precision instrument. The original 
instrument can then be used along with a rating plate giving the 
error in terms of the measured quantity (an interval, an angle, 
etc.). 

Sometimes, rather than measure the error, we may be able 
to organize the measurements in such a way that the error will be 
eliminated. A very familiar example is that of the error due to 
the eccentricity of a protractor. It is rarely possible to make the 
instrument in such a way that the geometric center of the circle 
divided into degrees will coincide sufficiently closely with the 
center of rotation of the circle. Therefore, the calibrated arc does 
not measure the required angle (error of eccentricity). To meas- 
ure and allow for the error due to eccentricity would be quite 
tedious. However, it is easy to eliminate this error by rotating the 
circle 180° and taking another reading. From geometric consider- 
ations, it is clear that the arithmetic mean of the two readings will 
give the exact value of the angle, regardless of the eccentricity. 


205 


206 Mathematical Analysis of Observations 


Another very simple instrumental error is that of taking the wrong 
point as the zero point of a reading. As a result of such an error, 
all measurements will differ from the exact values by the same 
amount, which may be positive or negative. 

The errors connected with the instruments used in astronomy 
must be studied; they must be measured foreach individual device, 
and they must be eliminated from the results of the measurements, 
When possible, measurements should be made in such a way that 
the error is excluded by a combination of two or more measure- 
ments, 

Instrumental errors appear in every measurement result, They 
may be constant or they may depend in some definite way on other 
quantities, in particular, on the measured quantity itself. This is 
the reason for such errors being called systematic errors, 

Instrumental errors are not the only kind of systematic errors, 
For example, in the differential determination of the coordinates 
of heavenly bodies (e.g., of small planets) from photographs, the 
positions of these bodies are measured with respect to base stars. 
Since the coordinates of the base stars, which are taken from 
catalogues, contain systematic errors (catalogue errors), the evalu- 
ated coordinates of the heavenly bodies in question will contain the 
same errors. These errors, like the instrumental errors, must 
be investigated and eliminated from the results of measurements. 

We may take as the general criterion for systematic errors 
the theoretical possibility of studying them and eliminating them 
from the results of measurements. Methods of allowing for 
systematic errors are considered inthe various fields of astronomy 
which depend on the results of observations. 


2. Random errors 


Experiment has shown that successive measurements of a single 
fixed quantity, made with the greatest possible care, give different 
numerical values even after all the known systematic errors are 
allowed for. This fact shows that physical causes of some sort 
have an effect on the results of measurements—causes for which 
we cannot make allowance, For example, suppose that an object 
is being weighed on sufficiently accurate and sensitive scales, 
If a door were to be slammed in the same building at the instant 
that the measurement is taken, the needleofthe scale would fluctu- 
ate in a direction not predictable and the number obtained would 
differ from the exact value. If a heavy truck were to pass by at 
the instant of another measurement, the new jolt would differ 
from the preceding one and the result would be changed again, 
If the measurements are not completely mechanical, if any human 
being takes part in them, the random changes in the state of those 
Organs that he uses to make the measurements will have an effect 
on the result, 

A whole series of similar random causes may produce deviations 
from an exact value, Ineachcase, the deviation is Slight: otherwise 


General Remarks on Measurement Errors 207 


it would be noticed and investigated. However, the total effect of 
all these causes can yield significant deviations. 

The theory of errors usually has to do with the theory of 
random errors. For the construction of such a theory, the very 
ature of random errors suggests the apparatus of probability 

eory. 


3. Personal errors. 


Experience in astronomical observations has shown that the 
results of measurements depend to some degree on the physical 
peculiarities of the observer (under otherwise equal conditions), 
For example, in recording the instant of a phenomenon, one 
observer may regularly notice a phenomenon somewhat sooner 
than will another. Repeated study of the personal errors of dif~ 
ferent observers has shown that these errors can be both syste- 
matic and random. It is known that some average amount of 
personal error is associated with an observer, and this error 
Should be considered systematic and taken into consideration in 
the analysis of the observations. However, in individual observa- 
tions, the personal error is a random quantity that varies for 
different reasons (the physical condition of the observer, external 
conditions, etc.). Observations are made to determine the personal 
errors and the results of these observations are analyzed in much 
the same way as in the case of random errors, in order to obtain 
their average value. 


4. Gross errors. 


In the analysis of observations, we need to allow for the pos- 
sibility of blunders or external influences that cause completely 
inaccurate results. One of the simplest of these will be for an 
observer to read twenty and write down thirty, for example. A 
very simple example of an external cause of a large error that 
might not be noticed by the observer is a jolt that would distort 
the result. The presence of gross errors is detected by the fact 
that in a succession of comparatively close results only one or 
only a few values will differ appreciably from the general level of 
values; that is, these results stand out. If the discrepancy is great 
enough for us to be sure that it is the result of an error, the 
measurement can be disregarded. However, the situation is 
rarely that simple. If we keep in mind that a random error repre~ 
sents the sum of a large number of small random errors, we may 
conceive of an unfavorable combination of constituent small errors 
that will give a random error that is large in absolute value. Itis 
true that the probability of such an unfavorable combination is 
very small, but it is not equal to 0; that is, such a combination is 
theoretically possible. Therefore, we may not as a rule immedi~ 
ately call an observation a gross error merely because ofa 


208 Mathematical Analysis of Observations 


sharp discrepancy. We shall consider gross errors from a 
probability standpoint later on. 


66. THE BASIC HYPOTHESIS IN THE THEORY OF 
RANDOM ERRORS. METHODS OF EVALUATING ERRORS 


The necessity of applying probability theory to the study of random 
errors is rather obvious. Since a random measuring errorisa 
continuous random variable, in order to construct the theory we 
must have the probability density or the distribution function. 

Let us denote the unknown exact value of the quantity measured 
by a and let us denote an arbitrary measurement of this quantity 
by x. We apply the term ‘‘error’’ to the quantity 


bx — a. (13.1) 


This error is considered random and consequently the values of x 
are also random, 

We make the following assumption: The random errors obey a 
normal law of distribution with center equal to 0. According to 
this assumption, the probability density of the random variables 
is determined by the formula 


é 


e 2 (13, 2) 


i — 
P (8) oV 2n 


from which it immediately follows that the probability density of 
random measuring results is expressed by 


(c—a) 


1 —— 
()=—Fee (13.3) 


\ 2 


As we know, o? in this expression is the variance of the random 
errors, In what follows, we shall call the quantity o the mean 
Square error of a measurement, It is a numerical characteristic 
of the quality of the set of measurements for which it is given or 
calculated, The probabilistic significance of the quantity cis known 
from the general theory of a normal distribution. As we have seen, 


P(|8|<0)=40.68;  P(|8| < 3s)~0.9973 ete, 


The greater the value of o, the poorer is the quality of the meas- 
urements. 


We use three other quantities to describe the quality of the 
measurements: 
(a) The probable error 


General Remarks on Measurement Errors 209 


which is determined by the condition that P(|3|<r)==0.5, 
(b) The absolute error 


m=E(|8|) 


that is, the expectation of the absolute value of the error (it may 
be shown* that m= 0.83), 


(c) the modulus of precision: 


1 0,7 


A= — —., 
coV2 o 


The probable and absolute errors are great when the quality of the 
measurements is poor. The modulus of precision increases with 
increasing accuracy of measurements, 

The assumption of a normal law of distribution can be justified 
by Lyapunov’s limit theorem if we accept the view stated above 
with regard to the net effect of a large number of small errors. 
It also follows from the conditions of Lyapunov’s theorem that the 
assumption of a normal law indicates that the constituent errors 
must be of approximately the same order. If, from physical con- 
siderations, we may expect that there is a single outstanding 
error, a supplementary assumption must be made with regard to 
the distribution law of the predominant error, retaining the 
assumption of the normal law for the set of the other constituent 
errors. Then we must derive a distribution law for the sum of 
the variables. ** 

The assumption of a normal law can be verified by observa- 
tions. Let us suppose that we have found an approximate value 
of the quantity in question with a sufficient accuracy to take it 
as the exact value (with a very small error). In practice, such 
a value can be dervied from a very large number of observations. 


*From the definition of expectation, 


+0 


7a 
m= ! f JdJe 20° ti; 


3V on » 


oO 


or 


a3 
If we make the change of variable a8 = 2, we obtain 


_ = ——_ 
m= 2 et 
0 


**A question similar to this one is examined in the book by V. L, Goncharov. 


P 


210 Mathematical Analysis of Observations 


The quantity ¢ can then be evaluated by taking the square root of 
the sum of the squares of the deviations of the results from the 
‘exact’? value and dividing by the number of measurements, 

Consequently, we shall assume that u and o are exact, From 
a property of the normal law, we can compute the probability that 
the absolute value of the error will lie within arbitrary given 
bounds, 


P(|8| ac) 20 (a), a>, 


where ®(z) is a tabulated function (see Chapter 11). (Here, the 
bounds are taken in units of o.) Let us assign various values to a 
within the limits between 0 and 3 and let us calculate the number 
n, of values of 8 that satisfy the above inequality. We may also 
calculate the theoretical number of such values 


n= n- 2D (a), 


where n is the total number of observations. Comparison of the 
table of values of n’ and n, gives an indication of the acceptability 
or inacceptability of a normal law. Such a check has been made 
and its results are given in the book by A. N. Krylov Lektsit o 
priblizhennykh vychisleniyakh (Lectures on Approximate Calcula- 
tions), The above text also presents various derivations of the 
normal law, on the basis of different types of hypotheses, 

As with every mathematical theory, it is possible, in this 
case, to begin with basic asSumptions other than that of a normal 
distribution. Thus, for example, in one of Gauss’ works, the 
theory of random errors is based on the assumption that the most 
probable value of a measured quantity is the arithmetic mean of 
the values obtained in a number of equally accurate measurements, 
With this postulate, it is possible to show that the errors obey a 
normal distribution law. 

Let us now consider a method of eliminating gross errors, 
under the assumption of a normal distribution law of random 
errors. Let us suppose that as a result of several measurements, 
we have found an approximate value of the measured quantity x 
and of the mean square error o of a single measurement, Let us 
determine the approximate value of the error of each measurement: 


ep =X, — Xe Dy. 
Because of the normal distribution, 
P (8!) < 33) = 0.9973, 
and consequently, 


P(\8| > 30) = 0.0027, 


It is usually assumed to be unlikely that the absolute value of the 
error will exceed 3c, Therefore, if we find that the absolute value 


General Remarks on Measurement Errors 21] 


of any of the «, exceeds 30, the corresponding measurement is 
assumed to contain a gross error and is discarded, In certain 
types of problems, the conditions under which measurements for 
which |¢,|>> 2c are discarded are considered. After such observa- 


tions are discarded, it is necessary to recalculate both the approxi- 
mate value x and the number sz. 


Chapter 14 


ANALYSIS OF EQUALLY PRECISE 
MEASUREMENTS OF A FIXED QUANTITY 


67. THE PROBLEM OF ANALYZING MEASUREMENTS 
OF A FIXED QUANTITY 


Suppose that n measurements are made of an unknown quantity a 
and that the results of these measurements are x,, x, ..-., xy. 
From this set of numbers, we must derive the approximate value 
of a that is (in some specified sense) the most acceptable, and 
also the approximate value of the mean square error of an individual 
measurement, The expression ‘‘most acceptable’’ value usually 
means the most probable value. The value takenfor a is a function 
of the random numbers x,, x.,..., x, Therefore, it too should be 
considered random. We then have the problem of a probable 
estimate of the accuracy of the result. In other words, in addition 
to the mean square error of an individual measurement, we must 
also compute the mean square error of the most probable value 
of the measured quantity. 

However, we first need to introduce the concept of equal pre- 
cision of measurements, This is the probabilistic expression of 
the ‘‘ordinary’’ concept of equal accuracy of all the results 
X1, X2,..., X, Of measurements of the quantity in question. The 
kth measurement (for(k=1, 2...) gives a random result x,. The 
randomness appears in the fact that when sucha series of measure- 
ments is repeated, the kth measurement will give a new value x; 
different, as a rule, from x,. The set of all possible results of the 
kth measurement is determined in the general case by the proba- 
bility density with parameter o,. The formal test of equal precision 
of measurements is the equality of all the o,, independent of the 
subscript &. (We indicate the common value of the «, by <.) 


68. THE MOST PROBABLE VALUE OF A MEASURED 
QUANTITY. THE METHOD OF LEAST SQUARES 


Suppose that n equally precise measurements.,, x. ..., x,o0f a 
fixed quantity a are made. Let us also suppose that the quantity a 


212 


Analysis of Equally Precise Measurements 213 


is known. Then, we may determine the errors of the individual 
measurements 


Oo, == A, — 4, (14.1) 


We also assume (in the present section) that the mean Square 
error of a single measurement s is known. By using the properties 
of probability density, we can write an approximate expression for 
the probability that the error in the &th measurement will be close 
to that obtained: 


are MM RL Wm, (14.2) 


Pi (Bn SO <8, + AB) = 


where Ao is an arbitrary small positive number. We adopt the 
following notation to write this equation more briefly: 


2 
oy. 


P, (6 8,) e 2 AB. 


] 
oV dr 


Now, let us drop the assumption that the quantity a is exactly 
known and let us make various hypotheses about the value of the 
quantity a, To each such hypothesis, there corresponds a set of 
numbers 4, and a corresponding set of probabilities P,. The 
conditional probability P, of the event that +—~«s, in the eth meas- 
urement, given an arbitrary hypothesis as to the value of a, is 
given by the equation 


e237 Ad (14.3) 


Py. (6 0, | a) 


oY 2r 


The left side of this equation should be read ‘‘the probability that 
the error in the kth measurement will be approximately equal to 
¢, (i.e., between 6, and ¢,-1+A%), under the hypothesis in question 
with regard to the quantity a.’’ 

Analogous equations may be written for all the measurements, 
Each of them determines the probability of the event referred to. 
Let us assume that the measurements are mutually independent, 
that is, that the probability of each value of the error in the eth 
measurement is independent of the value of the errors made in the 
other measurements. The corresponding events (the occurrence of 
errors of different magnitude) are also mutually independent. 

Let us now calculate the conditional probability of obtaining a 


definite set of errors 4,, %,...,4,, given a particular hypothesis 
with regard to the quantity a. This probability can be denoted 
by P(8-8,, 3, ...,%|@). Since we have a set of mutually independent 


events (errors), the probability that we are seeking can be found 
by using the theorem on multiplication of probabilities for mutually 
independent events. By use of (14.3), we obtain 


214 Mathematical Analysis of Observations 


AS \” k= 4,4 
P (878, 8, ..-, by] a) (a=) spf 2} (14, 4) 


Since, by (14.1), 8, =x,—a, we have 


ABN” = 4,5 
P(X, yoo nl) = (Se) ay en” (14.9) 


The right side of this equation is a function of the quantity a. If 
we assign different values to a, we shall obtain different proba- 
bilities. 

Let us now suppose that all the hypotheses concerning the 
variable a in some arbitrary region are equally precise, Then, 
from the corollary to Bayes’ formula, the probabilities of the 
different hypotheses, after the occurrence of the event, are 
proportional to the conditional probabilities of the event under 
these hypotheses. In the present case, the event is the occurrence 
of the numbers x,, x,,..., x,. Therefore, we may write 


P(A| Xi, Xo, oes co <e(-e)top| | (14.6) 


where C is a proportionality constant. 

From equation (14.6), the probability of a hypothesis concerning 
the variable a depends on the value of a that appears in the 
exponent on the right. Itiseasytosee that the probability P has its 
greatest value when the quantity 


S (a) = >» (x, — a) (14.7) 


is smallest. The problem of determining the most probable value 
of a is thus reduced to finding that value of a at which S(a) is 
minimum. As we know from analysis, a necessary condition for 
an extremum of S(a) is that 


n 
as Y 
nie y (x, — @) = 0. 

k=l 


Since S(a) is a positive-definite quadratic form in a, $ can have 
only one extremum and that one must be a minimum. When we set 
the first derivative equal to 0, we obtain the most probable value 
of the unknown variable a: 


Analysis of Equally Precise Measurements 215 


n 


at (14.8) 


n 


@ prob = 


Thus, we arrive at the following conclusion: The most probable 

value of a fixed quantity on which equally precise measurements 

have been made is the arithmetic mean of these measurements. 
Henceforth, instead ofaprob, we shall use the notation x: 


rome (14.9) 


This method of calculating an approximate value of the variable a 
is called ‘‘the method of least squares,’’ since we use the condition 
that the sum of the squares of the errors S(a) be minimized. 

From (14.3) we have the following property of the arithmetic 
mean, which we shall find useful in the future: the sum of all the 
deviations from the arithmetic mean is equal to 0. This is true 
because, according to (14.9), 


nr 4 


~ (x, — *) = 2 Xk — nx = QO. (14.10) 


69. THE MEAN SQUARE ERROR OF THE 
ARITHMETIC MEAN 


In a given set of n observations, the arithmetic mean x isa linear 
function of the results x,, «.,..., x, of these observations (cf, 
equation (14.8)). If we again make a set of » measurements, the 
values x, will, because of random influences, differ from those 
obtained the first time. Therefore, the value of x willbe different. 
The same will be true of all subsequent sets of measurements, 
Therefore, the number x that we obtain as a result of one set of 
measurements is a random approximate value of the desired 
quantity. To get an idea of the possible deviations of x from the 
desired exact value, we need to calculate the mean square error 
of the arithmetic mean, To keep in mind just what it is that we 
are doing here, we should note that the mean square error of the 
arithmetic mean is a probability characteristic of the set of all 
possible values of the arithmetic mean that can be obtained from 
the n measurements, 

Since the results of the measurements are assumedto represent 
mutually independent random variables, application of the theorem 
on the variance of a linear function of such variables yields 


216 Mathematical Analysis of Observations 
Since the measurements are assumed to be equally precise, 
varx, =<? 


for all &. Therefore 


and 


(14.11) 


It should be noted that this formula is preczse, but the quantity s 
itself is unknown. 

We may consider as exceptional the rare cases in which experi- 
mental measurements are made with very small random errors, 
In the case of such high accuracy measurements, the results will 
be expressed as many-digit numbers with a large number of 
initial digits in common, Such measurements, for example, might 
be 3.48754, 3.48733, 3.48761, etc. In such a case, we may take 
the common part of all the results as the exact value of the 
measured quantity. In the present example, this would be 3.487, 
which has four significant figures. If measurements of lower 
accuracy are made, the individual measurements will not all 
yield four reliable digits. For example, we may obtain 3.485, 
3.488, etc. Deviations from the ‘‘exact’’ value can be considered 
exact errors (‘‘exact’’ in a conventional sense, specifically, having 
a specified number of significant figures). The mean square error 
of a single measurement, on the basis of n measurements under 
these conditions, is defined as the square root of the sum of the 
Squares of the errors divided by the number of measurements. 
However, it’ should be noted that we still do not obtain the ‘‘exact’’ 
value of os, The quantity o is a parameter of the normal distribu- 
tion law of the errors. To find its value with a sufficient degree 
of reliability, we would have to make many similar measurements 
and deduce the value of « from them. 


70. THE MOST PROBABLE VALUE OF THE MEAN SQUARE 
ERROR OF AN INDIVIDUAL MEASUREMENT 


In the derivation of the most probable value of a measured quantity, 
it was assumed that the quantity « was known. Therefore, the 
probability of obtaining a certain set of errors (which is equal to 
the probability of obtaining a certain set of measurement results) 
depended only on the hypothesis made regarding the variable a. 
Let us now assume that the quantity « is also unknown. To 
examine the probability of the set of errors, we now need to make 
hypotheses regarding the values of both a and oc. Just as was the 


Analysis of Equally Precise Measurements 217 


case in Section 68, the probability of obtaining the results 


Xi, Xy,+.., ¥, under these hypotheses is given by the formula 
P(X SX Nn oes, X,/ a, 0) = (AL) o-nexp [SM 
( 1 V9 Xn | a g) (T=) ] exp O92 |. (14.12) 
where 
S (a) = 2 (x, — a)’. (14.13) 


Let us rewrite the quantity S(a) as follows: 


S(a)= Bite —aP = Fl — N+ — a= 
— Pate? — xy +2 (x — a) = (xX), — x) +n (x — a), 
= =1 


From property (14.10) of an arithmetic mean, we have the exact 
equation 


» (%,—x) =0 
k=1 


(if x is exactly computed). Therefore, 


n 


S(a)= > (x, — xP tn (x — ay. 
K=1 


The probability of the set of measurements can then be written in 
the form 


P(X Wx, Xo, - ee. Xy_| a, 3) = 


AB Vor 
=( _ ) ——-37-"+lexyp —_— 


n _ ~ 
> (x;,.— x)? 
k=1 
VY2x/ Va 


233 


x | —— exp ' (zy | }- @4.14) 
Y 2a -—— 2( 
F 


Vn 


* 


Here, the desired probability is a function of the two independent 
parameters a and sg, 

To obtain a rule for finding the most probable value of o« that 
is suitable for all values of a, let us assume that all values of a 
are equally probable in some region from —a to a that is large 
enough for us to be able to consider it as practically equivalent 
to the interval from —-o to +. For this, it is sufficient that 
a be a magnitude of the order of 3 — 4c, 

Let us multiply both sides of this equation by da and let us 
integrate with respect to a from —co to -+ oo, From the 


218 Mathematical Analysis of Observations 


generalized theorem on the addition of probabilitles, we obtain 
on the left the probability of the set of measurements for all 
values of a with fixed o. On the right side, we need to integrate 
only the last factor since it is the only one containing a. Formally, 
this factor is the normal probability density of the ‘‘random 
variable’? a. Therefore, its integral from — oo to -++co is unity, 
and we obtain 


Ads \"V On S 
PUX RS xy on vee Hn l= (FE) TE OP (“sao 4.15) 
where 
S= > (4, — xy’. (14,16) 
k=1 


Our reasoning now proceeds along the same lines as in Section 
68. We have the probability of the occurrence of our measurement 
results under the chosen hypothesis regarding the quantity « and 
with arbitrary values of a, This is the conditional probability of 
the event (X=¥x,, x, ..-, *,) under the hypothesis regarding the 
quantity os. Let us suppose that prior to the observations, all the 
hypotheses regarding the quantity « are equally likely. From the 
corollary to Bayes’ formula, the probabilities of the hypotheses 
regarding the value of o after the occurrence of the event 
Xi, Xo, .++,%X, are proportional to the conditional probabilities of 
the event under these hypotheses, Therefore, 


S 
Po] X,, Xo, 000, X_)—=CormMtle 2 | 


(14.17) 


where the coefficient C includes the proportionality constant ap- 
pearing in the corollary of Bayes’ formula aswell as all the factors 
that do not depend on a, 

Let us now determine the most probable value of s (that is, 
of the meai: square error of an individual measurement) that can 
be obtained from a given set of measurements. For this, we 
need to find that value of so at which the function 

s 
P(s)=2 acntip et 


has a maximum. If we take the logarithm of both sides and then 
differentiate twice with respect to :, we obtain 


InP =(1—n)Ino— 3, 
1 dP lan S 
Pods” s 33! 

1 dP) 1dP on—1 ~— 38 


P® ds P dst a2 “4° 


If we set the first derivative equal to 0, we obtain 
a 14,18 
prob) an — 1" (14.18) 


Analysis of Equally Precise Measurements 219 


If we substitute these values of « into the equation containing the 


second derivative and remember that() = Cats ="prop, we obtain 


1 a2P (n— 1)" 
1 —— 9" IF ~ 9, (14.19) 
(= da? Jeep S 


which confirms that the value obtained for 3 is the most probable 
one, 
Henceforth, instead of opropwe shall write simply o. If we 


substitute into (14.18) the value of S given by (14.16), we obtain 


n—1 (14.20) 


After computing x and c, we need to look through a table of values 


of ¢,=x,— x. If any of these numbers have absolute values exceed- 
ing 33, they should be discarded and x and so must be recomputed. 


71. A SECOND DERIVATION OF THE APPROXIMATE 


VALUE OF A MEASURED QUANTITY AND OF THE 
APPROXIMATE VALUE OF THE MEAN SQUARE 
ERROR OF AN INDIVIDUAL MEASUREMENT 


Let us consider the set of measurements as a sample of n items out of the infinitely 
many results that are possible due torandomerrors, We base the theory on the following 
two hypotheses: 

(1) The expectation of each measurement result is equal to the exact value of the 
measured quantity; 

(2) The variance of each measurement result is the same for all measurements, 

The first of these can be considered natural, The second means that we are dealing 
with equally precise measurements, We make no assumption as to the law of distribution 
of the random errors of the measurements, 

According to the corollary to Chebyshev’s theorem, if a sufficiently large number of 
measurements are made, we may assert with probability arbitrarily close to unity that 
the exact value and the arithmetic mean of the sample values will be arbitrarily close, 
This provides the basis for taking the arithmetic mean as the approximate value of the 
measured quantity. Consequently, we postulate the approximate equation 


nr 


Er 


=> ke] 


Nn 


When a large number of observations are made, such an approximation is, to a certain 
extent, jusdfied by Chebyshev’s theorem, When a small number of observations are 
made, we can consider it only a convention, 

However, we may apply a criterion proposed by A, A, Markov for an approximation 
that does not contain systematic errors, We denote by §& an arbitrary approximation of 
the quandty a, This approximation will not contain systematic errors if the expectation 
of the approximation ls equal to the expectation of the approximated quantity (which, in 
the present case, is the exact value), We denote the random results of the measurements 
by §&, €,..., &). Their arithmetic mean is 


Sit Sot ee thn 


n 


—— 
— 


220 Mathematical Analysis of Observations 


From the first assumption, 


B® =2tate +4 a4 


r 


Thus, if we take the arithmetic meanofthe measured values as the approximate value, we 
satisfy Markov’s criterion, 

To get an approximate value of the mean square error of a single measurement, we 
proceed as follows. For the items in our sample, we find the expectation of the sum of 
the square of the deviations of the individual values from their arithmetic mean; that is, 
we find the quantity 


We expand the expression in the bracesandapply the equation 52, = ni., We then obtain 


n n 
Eo e| = 22 ne = > E(&,) — nE (6°). 
cml 


Since a = E(&,)= E(&), from the property of the variance, for the random variables 
Sx(R = 1, 2, ..., 2) we have 


E (&.) = git q?, 
E(§) = a2 4a, 
4 


The quantity Fe is determined from the theorem on the variance of a linear function: 


hence, 


Therefore, 


E(&8) = — + a3, 


t4| 


22 aG?-b at) a(S a2) = (n— 1) 93. 


Thus, we come to the exact formula: the expectation of the sum of the squares of the 
deviations (of the sample random values) from the arithmetic mean of these values is 
equal to the variance of a single measurement times n— 1, where rn is the number of 
elements in the sample, 

The expectation of the sum of the squares of the deviations from the arithmetic mean 
is the mean value of this quantity as obtained from all possible samples, Instead of the 
mean value of the sum of the squares of the deviations, we substitute into the exact formula 


that value that is obtained for one sample, We then obtain an approximate formula for 
calculating the variance of a single measurement: 


n 
Noy. vi 
oO, (Xk Vv) 
Py nt 
n— | 


(instead of &, and =, we write x; and x), Note that in the derivation of this formula 
no assumptions were made as to the distribution law of the random errors, 


Analysis of Equally Precise Measurements 221 


72. AN EXAMPLE OF THE ANALYSIS OF EQUALLY 
PRECISE MEASUREMENTS OF A SINGLE 
FIXED QUANTITY 


The geographic latitude of the Tashkent observatory has been 
determined from the observations of fourteen stars. The results 
are given in column II of the following table: 


#1819933. 10" 3. , ¥o = 41°19'30' 

9 . , , 
28.95 0: | | tk = gh ~ 903 
30.08 . X = 19.87":14 = 1.42"; 
28.81 . . ¢ = 41°19'30"+ 1.42" = 
31.16 . , = 41°19'31,42"; 
31.38 


32.15 | : o* = 50.28: (14-1) = 
34.89 , ~ 3.87; o = 1.97"; 


33.34 3 o_ = 1.97 V14 = 0.53; 
32.39 . ° 

31.5] ¢ ~ 41°19'31.42"+ 0.53"; 
31.35 1, , «¢ - 41°19'31.4"+ 0.5" 
27.80" 


+19.87" (-0.01") 50.28 


Since the observations were made under approximately equal 
conditions, they may be considered equally precise. Column III 
gives the deviations of the values obtained for the latitude from the 
value 41°19°30"; these deviations are denoted by the symbol 
x, (for k=1, 2, ..., 14). This was done only to simplify the calcula- 
tions. The use of these numbers instead of the -;, is equivalent to 
changing the origin from which the latitudes are calculated. Itis 
necessary to take this into consideration only in deriving the most 
probable value of the latitude from all the observations. The new 
origin is not involved in the calculation of the mean square errors 
because, as can easily be seen, the displacement of the origin 
does not change the deviation from the mean. Column IV gives the 
deviations of the values of x, from the mean x. For the reasons 
just given, these numbers represent the deviations of the numbers 
y, from their mean value. The sum of the numbers in column IV 
can be calculated as a check: if the mean x were formally calcu- 


> x, can be divided evenly by n, with the 


soi 


same number of digits after the decimal point as are taken in the 
calculations), this sum would have to be exactly equal to 0. 
Usually, the mean is calculated with a limiting error equal to one 
half the unit of the last digit. In this checking operation, the 
numbers x, should be assumed exact. Therefore, the sum of the 


lated exactly (when 


222 Mathematical Analysis of Observations 


numbers in column IV must not exceed the product of one half 
the unit of the last digit in x and the number of measurements, 

Column V gives the squares of the deviations. Their sum is 
necessary for calculating the mean square errors, In practice, 
instead of writing out the squares of the individual deviations, 
we may obtain the entire sum of the squares of the deviations 
on a calculating machine by the method of ‘‘accumulation.”’’ If 
we do this, however, we do not have the sum-of-squares check 
and, therefore, the check must be made in another manner, 
namely, by using the following identity, which is a consequence of 
the formula for calculating the variance: 


rr 


it 
p2 (X_,— x)? = > Xp-— XN. 
Ka k=1 


This identity is satisfied almost exactly because, in the calculation 
of the sum of the squares on the calculating machine, all the 
digits formally obtained are usually used and there will only be 
the discrepancy resulting from the inexactness of the mean x. In 
each particular case, we may calculate the limiting value of the 
difference between the two sides of the identity. 

We now make some clarifying remarks about the calculations 
in the last column, 

All the calculations of the most probable value of the latitude 
and of the mean square errors appear in column VI. The number 
o, referred to above, is written first. The mean value x is then 
computed. When this number is added to »,, we obtain the arith- 
metic mean, that is, the most probable value of the latitude. If 
we divide the sum of the squares of the deviations from the mean 
by the number of measurements minus 1, we find the square of 
the most probable value of the mean square error of a single 
measurement. If we divide s by the square root of the number 
of measurements, we get the mean square error of the result, 
The result is usually written as shown in the next-to-the-last 
row of column VI, Sometimes, the probable (instead of the mean 
square) error is given. Therefore, it should be stated what kind 
of error is given in the result. 

It is clear from the value of the mean square error of the 
mean value that the hundredths of a second are completely un- 
reliable in the result, Therefore, it would be better to give the 
final result only up to the tenths of a second, as was done in the 
last row of column VI. The probabilistic meaning of the result 
follows from the properties of the normal distribution. The actual 
value of the latitude lies between 41°19'30.9" and 41°19‘31.9" 
with probability 0.68; it lies between 41°19’29,9” and 41°19'32,9” 
with probability 0.9973. 

Since a reliable knowledge of the sum of the squares of the 
deviations is necessary for the determination of the mean errors, 
we shall check it by means of the identity given above. We obtain 


Analysis of Equally Precise Measurements 223 
D> (4, — x)? = 78.49 — 28.23 = 50.26. 
Kal 


This difference between the initial value and the check is insignifi- 
cant, Therefore, we maybe sure of theaccuracy of the calculations. 

Finally, by using the mean square error of a single measure- 
ment so, we may see whether there are gross errors in the meas- 
urements or not. To do this, we take the deviation from the mean 


x,—x (for k=1, 2 ..., 14) of greatest absolute value and we find 
the ratio of this deviation to the mean square error ofa single 
measurement, In the present example, we have 3.62/1.97 = 1.84. 


Since this number is less than 3, we may assume that there are 
no gross errors. 


Chapter 15 


ANALYSIS OF MEASUREMENTS 
WHICH ARE NOT 
EQUALLY PRECISE 


73. THE CONCEPT OF UNEQUALLY PRECISE 
MEASUREMENTS. WEIGHTED MEASUREMENTS 


In practice, we often can get the most reliable determination of 
some quantity or other by comparing measurements made on 
different instruments and by different techniques. Thisis especially 
true of the fundamental astronomic constants. One of these funda- 
mental constants is the average distance between the earth and 
the sun or, equivalently, the parallax ofthe sun. The solar parallax 
is determined in a number of ways, which give slightly different 
results. 

The simplest example of unequally precise measurements is 
the case of a set of values that are not directly measured but that 
are deduced from equally precise measurements, the number of 
which is different in the case of different deductions. 

Suppose that we have made m, equally precise measurements 
and that we have found the most probable value x, of the quantity 
being measured. Suppose that we now make a second series of 
m, measurements of the same quantity, either at the same or at 
another observatory, and that these measurements give a number 
X,. Suppose that a third set of m, measurements gives the result 
x3, etc. We now have the problem of deriving the most probable 
value of the measured quantity on the basis of these results 
My Moyes ey Hye 

Let us suppose that the basic measurements from which these 
numbers are derived are equally precise (for example, that they 
were made on the same type ofinstrument though at different obser- 
vatories), We denote by « the mean squareerrorofa single meas- 
urement, From formula (14.11) of Section 69, for the mean square 
errors of the numbers x,, xo, ..., «,, we have the expressions 


=, k=1, 2..., 0. (15.1) 


Analysis of Unequally Precise Measurements 225 


Since the mean square errors of the given values X11 Xo, 00s Xp 
are different, these quantities cannot be considered equally precise. 

In other cases, the given values x,, x, ..., x, must be con- 
Sidered unequally precise on the basis of information regarding 
the various conditions under which the measurements are made, 
the varying accuracy of the instruments used, etc. 

The mean square errors «,, 5, ..., 3, of the measurements 
that gave these values are not known in all problems of this type. 
sometimes, on the basis of general information regarding the 
conditions under which the measurements are made and the ways 
in which x,, x,..., *«, are determined, it is known that the meas- 
urements are unequally precise, but there are no definite numer- 
ical criteria for the degree of inequality of precision. 

The formal test of unequal precision of the measurements is 
the fact that the mean square errors g,, %,..., 3, Of the meas- 
urements that gave the random values x,, «:,..., «,, respectively, 
are different. Let us assume that 3,, ..., 0, are unknown, 

In considering unequally precise measurements, it is con- 
venient to use the numbers p,. p2...., p,, known as the weights 
of the measurements, instead of the quantities 9, 0, ..., o,. These 
are defined by the equations 


9 
Pry (19,2) 
ox 
where 9 is an arbitrary positive number. From the definition, it 
follows 


(a) that the weights of unequally precise measurements are 
inversely proportional to their variances (that is, to the squares 
of the mean square errors) and 

(ob) that the weights are relative numbers; that is, all the 
weights may be multiplied or divided by a single number (this 
being true because of the arbitrary choice of <3). 

There is a simple interpretation of the proportionality constant 
o*, If there exists a k= such that 5;=, then pr; =!. Therefore, 
we may Say that s is the mean Square error of a measurement of 
unit weight. Sometimes, we say briefly that o, is the mean square 
error per unit weight. Sometimes, 5, is chosen in such a way that 
the sum of the weights for all the measurements is equal to unity. 
Then, all the weights will be fractions. In astronomical work, we 
usually choose 3) in such a way that all the weights are integers. 

We should mention one case, referred to above, in which the 
weights can be determined immediately. Suppose that the numbers 
X1, X9..., x, are the arithmetic means obtained from m,, m,,..., my, 
equally precise observations, It is clear from formulas (15.1) 
for the s, that the 5s; are inversely proportional to the numbers 
m,. But, according to (15.2), the weights are inversely proportional 
to the 23. Therefore, in the present case, the weights are directly 
proportional to the number of observationsm,. In this case, we 
take for the weights those numbers of observations from which the 


Q 


226 Mathematical Analysis of Observations 


means were obtained. This can be stated in another way. Each 
mean obtained from m, observations is equivalent to m, equally 
precise values. Therefore, m, is the ‘‘weight’’ of the number ¥x,. 

Thus, the method of determining the weights of unequally 
precise measurements consists in the following: 

1. If the >, are known, we choose «o, arbitrarily. Itis advisable 
to take a value close tothe meanof the «,. We calculate the weights 
from the formula 


2 
oh) 


Pym ye 
ok 
2. If +,, %,..., x, are the arithmetic means obtained from sets 
Of m,, mo,..., m, equally precise measurements, we take these num- 


bers for the weights. 

3. If neither of the above conditions is satisfied but, for 
some reason or other, we must consider the observations as un- 
equally precise, the weights are assigned by the investigator some- 
what arbitrarily (though in such a way that the most precise 
observations have the greatest weights). 


74. THE MOST PROBABLE VALUE OF THE 
MEASURED QUANTITY 


Suppose that values x, with weights p, are given. Independent of 
the method in which the weights are obtained, we can, from the 
definition of weights (15.2), write 


Ss 

Cy 
a..| 2 
Rie) O ov 

. 


where o5 is an as yet unknown factor. (It has a definite value, 
however, because the choice of the given numbers p, means that 
a definite value is assigned to the number 33.) For the moment, 
let us assume that o, is given. From the weights and the value 
of o, we determine the mean square errors of the measurements: 


r— 
o, = V Pr (15.3) 


So ° 


If the numbers +, are given instead of the weights, we assign a 
value to s and calculate the weights from the same formula. The 
normal distribution law of the errors can be written in the form 


for each measurement. 


Analysis of Unequally Precise Measurements 227 


Suppose that some hypothesis is made regarding the unknown 


quantity a. Then, for each measurement, we can determine the 
error 


and can calculate the probability of obtaining an error close to ?, 
We denote this probability by P(i,<4< 3,-4.3) or, more briefly, 
P (d= Oh). 

From the normal law, 


V py Ad p,d% 
P(6—~6,|a)= ex Le 
i | )= ag VW 2x P Qa° 


0 
Assuming, as in the preceding chapter, that the results of the 
measurements are mutually independent random variables, we use 


the theorem on multiplying propapeaes to obtain the probability 
of a set of errors close to 3,, : 


P(Sr~4,, O, ..., 3,;a) = 

> P,8 
= (75 Te ) V PiPo +++ Pr - Pn &XPp |- ; Sth | (15.4) 

VY 2n 23;, 
Under the hypothesis chosen regarding the quantity a, the errors 
6y, G,...,0, are uniquely determined by the numbers «,, x,, ..., «X,. 

Therefore, 
P (626, Oy, «0+, Op) == P(X WX, Xq, .-., X,| 0), 


that is, the right side of the preceding equation gives the conditional 
probability that the results of the measurements will be close to 
those obtained for the given value of a. This probability is a function 
of a. Let us assume that prior tothe measurements all the hypothe- 
ses regarding the quantity a are equally probable in some region. 
From the corollary to Bayes’ formula, thea posteriori probabilities 
of the hypotheses after the event has occurred are proportional to 
the conditional probabilities of the event under those hypotheses. 
Here, the set of results of the measurements (x,, x, ..., x,) isa 
random event, Therefore, we may write 


P(a|x,, Xo, ..-, X,) = Cexp] — et (15.5) 


296 


Where C includes the proportionality constant and the factors 


As 
(Fe) V Pio +++ Pn: 


228 Mathematical Analysis of Observations 


It follows from the equation above that the most probable hypothesis 
with regard to « will be the one that gives the quantity 


S(a)= »2 prot: (15.6) 


its minimum value. We replace ¢, with x,-—a and we find the 
value of a at which S(a) has its minimum value. When we differ- 
entiate S(u) and set the derivative equal to 0, we obtain 


n rn yr 

as a y cn Y ___ .’ a 

ate ) = — » « PR (x, —_— a) — Q, au p2 Vr dad Pek: (15.7) 
kel = - 


For brevity in writing, we define 


p= s Pre (15.8) 


k=1 


From (15.7), we obtain the expression 


™ 

SY) PuXk 
— Kal (15.9) 
x,=-— -- 
P p 


for the most probable value x, of the quantity a. This quantity 
is called the weighted mean of the unequally precise measure- 
ments. Thus, when unequally precise measurements are made, 
we need to take the weighted mean of the results of the measure- 
ments as the most probable value of the quantity being measured. 

Formula (14.9) is a special case of the formula that we have 
just written. If the measurements are equally precise, we may 
take the weight of each measurement as being equal to unity and 
obtain the simple mean, 

It is easy to see that the weighted mean has the following 
property. The weighted sum of all deviations from the weighted 
mean is equal to 0. For, on the basis of (15.9), we have 


a] r 


> - iy \y—wN, + yw 
Fre) Dk (X;, Xy) — Fromme Purk —_—— V4 Fama Vk = 
n 
_ WW . 7 
_-_— aan PRAYER — NX nN = 0. 
ms P (15.10) 


75. THE MEAN SQUARE ERROR OF THE WEIGHTED 


MEAN 
The quantities x,, *,,..., «, are the individual values ofthe random 
varlables that are the possible results of the measurements 
Ef -.-. & Therefore, the weighted mean x, 1s one of the values 


of the random variable 


Analysis of Unequally Precise Measurements 229 


n 
\" C 


. k=1 
—— 
p p 


Since =, is a linear function of :,, it obeys a normal distribution 
law (this follows from the basic hypothesis of a normal law for 
random errors and from the property of anormal law), Therefore, 
to evaluate the possible spread of the values :,, it will be suffi- 
cient to determine the variance of this quantity. From the theorem 
on the variance of a linear function of mutually independent random 
variables, we have 


9 


where the ;, are the variances of the random results of the suc- 
cessive measurements. If we use the weights p, instead of the 
o; in accordance with formulas (15.2), we obtain 


» n 


2 So y) a: 
(or —— 
E p- Pr p ] 


so that 


iy as. 


This formula is exact, but the mean square error per unit 
weight is unknown except in those exceptional cases in which the 
quality of the measurements is carefully studied with precision 
measuring devices. The fact that the denominator is the square 
root of the sum of the weights is the basis for choosing the weights 
in such a way that their sum is equal to unity (thus using the fact 
that the weights are relative). When this condition holds, the 
mean square error of the weighted mean is equal to the mean 
Square error per unit weight. 

Formula (14.11) of Section 70 can be considered a special case 
of formula (15.11). If the measurements are equally precise, the 
weight of each of them will be equal to unity. 


76. THE MOST PROBABLE VALUE OF THE MEAN 
SQUARE ERROR OF A MEASUREMENT OF UNIT WEIGHT 


In Section 74, to find the most probable value of the quantity 
measured, we used formula (15.4) to determine the conditional 
probability of the set of measurements under a hypothesis regarding 
the value of a with the quantity o given. To findthe most probable 
value of the mean square error with unit weight, we use the same 


230 Mathematical Analysis of Observations 


expression (15.4), but now we treat s, as unknown and make 
hypotheses with regard to its value. Therefore, we single out the 


factors in (15.4) containing » and Te and denote the remaining 
14 
factors by C’: 


P(X WX, Lar ve es Hy [Q, Oo) = 


We assume that all the values of a are equally probable and we 
integrate the above expression from —cc to +-co with respect 
to a, Then, on the basis of the generalized formula for the total 
probability, we find the probability that the results of the meas- 
urements will be close to those obtained for all values of a under 
the hypotheses made regarding the value of o. Denoting this 


probability by P(x > x,, %», ..., X,|9), We have 
P (8 EX, XQ, 60+, Xp] 99) = 
n 
Pai 
__ —n+1 __ k=1 
— C3 ; i Eo? | Se | (15.12) 


To calculate the integral in (15.12), we rewrite the sum appearing 
in the argument of the exponential: 


iM 


n 
Ph = a Dy (Xp — a)? = 
K=1 
n — 
= 2 Pi (me — ¥)) +, — a) = 
n __ nN 
= >) Pa (X_— Xp)? 4-2 = Du (X_— Xp) (Xp — @) + 
7 
+ 2 Dy (Xp — ay. 


In the second term, we can take out the factor (x,—a) Since itisa 
constant, although an unknown one (since a is unknown). We can 
replace the other sum 


2 Pu (X}, ~~ Xp) 
with 0 on the basis of the property referred to above of the weighted 


mean (cf, (15.10)). In the third term, the constant factor (x, — a)? 
can be taken out of the summation and the sum of the weights 


Analysis of Unequally Precise Measurements 23] 


can be replaced by p. Then, 


P(x =X, Nos see, xX, !39) = 
| > pr | 
= Capt! exp} — hal . 1% 
2of 
+oO _ 
] P(x,— ay 
x f -exp | — 242 = 2 5.13 
sb 2a P 205 | da. (t 
We make one further obvious transformation: 
P(X xy, XQ, 66, X,] oy) = 
n 
, » PRA XK xy)" 
__ © 5-n+1exp ead SC 
Vp da 
+ OO _ 
} (a— x,y 
“ Js mz | | ue (15.14) 
—~co 2( —2\ . 
Va (V5 ) 


The integrand in this equation can be formally considered as the 
probability density of the normally distributed ‘‘random’’ variable 


a with center of distribution x, and mean square deviation —~ 


Because of the property of the probability density, its intee eal 
over the entire region is equal to unity. Therefore, 


P(X REX, XQ, 000, Xp] %) = 
> PR(Xe—Xp) | 
= C”o5"texp} ———>~—____ ae (15.15) 
where C* = C. 
We_ have ° found the conditional probability of the event 
(x,, X%),..., x,) under the hypothesis made regarding the value ». 


Let us now suppose that the values of 5 are equally probable 
prior to the measurements in some region, which can here be 
assumed small. From the corollary to Bayes’ formula, the 
probabilities of the hypotheses after the measurements are pro- 
portional to the conditional probabilities of the event (the results 
of the measurements) under the hypotheses made. 

Therefore, 


P (8y{~ X,, Xo, 0-0, X,) = 


S PR(Xk . xy) 
= KC%o-"*! exp] — = | (15.16) 


where XK is a proportionality constant. 


232 Mathematical Analysis of Observations 


We may now seek the most probable value of « that can be 
obtained from the measurements. To do this, it will be sufficient 
to find that value of 5, at which the right side of (15.16) has its 
maximum. It is clear from the lastformula that ? will be greatest 
if the exponential is minimized. For brevity in writing, we define 


oh 


\ | 


3, / 


Taking the logarithm of the second equation and differentiating, 
we obtain 


S, x) n-+1 S 
S, — y) Pk (xk — Xp) ’ f (So) =) an exp{ — 5 


k=p 


in f (99) = (1 — 2) In og — S00 
afis a i - 
eu = f (59){ S00 3 (12 — 1) 99 ‘|, 


#I0) _ f (05) (S00 ° — (n — 1 oof + 
a3) 


+ f (09){ — 38,00‘ + (n — 1) 05°}. 


If we set the first derivative equal to 0, we obtain a value of 5, 
representing an extremum: 


gfe Sa 
0,e a hi— | 


(The e in the subscript indicates an extremum.) The first term 
in the expression for the second derivative vanishes ats—o),-and, 
after some obvious manipulations, the second term gives 


n—-1 


(“| = —2(n— Nore F <0, (15.17) 
J,=d4, @ 


2 
\ a35 


It follows from (15.17) that the value found for o¢ is the most 
probable one. We shall henceforth denote this most probable 
value «,e of the mean square error per unit weight simply by s 
without subscript. Then, 


[= 
y > Pr(xXR—- Xp) 
_ k=l 
a an (15.18) 


Obviously, formula (14.20) of the preceding chapter is a particular 
case of (15.18) because in the case of equally precise measurements, 
the weight of each can be taken equal to unity. 

We note the following fact, which is of significance in actual 
work. If the mean square errors of unequally precise measure- 
ments are given, to calculate the weights, we need to introduce a 
‘‘preliminary’’ value of the mean square error per unit weight. 
Let us denote it by 5. After the weights are calculated, the mean 


Analysis of Unequally Precise Measurements 233 


Square errors of the measurements are discarded, as it were, 
and only the weights are used. Therefore, the most probable 
value of a single mean square error o can again be determined 
from the weights and the results of the measurements. Sometimes, 
the value of o differs rather sharply from o. Usually, we take ; 
rather than 4, for the final value (since 4 is a formally chosen 
constant in no way connected with the measurements that are 
made). In calculating >, we depend on observations, and therefore 
this number may be considered to reflect to some degree the 
properties of the measurements. The number o, onthe other hand, 
is only an auxiliary number used to determine the weights of the 
measurements. 

In formula (15.11) for determining the mean square error of 
the weighted mean, we must replace « with « after calculating 
the most probable value of the mean square error per unit weight. 
If there are measurements that are in sharp disagreement with 
most of the remaining measurements, then these exceptional meas- 
urements should be assumed to contain gross errors and must be 
excluded according to the 3-sigma rule. If the « are given, 
measurements for which 


are discarded. If it is the weights and not the «, that are given, 
the s, must be calculated from the value of « and the weights p, 
and the 3-sigma rule must be used. Calculation of the s, must be 
made only for those measurements in which x, differs appreciably 
from x. If measurements with gross errors are found when the 
3-sigma rule is used (sometimes the 2- or 4-sigma rule is em- 
ployed instead), these measurements must either be discarded or 


be verified by some other supplementary means, 


77. THE PROCEDURE FOR ANALYZING UNEQUALLY 
PRECISE MEASUREMENTS OF A FIXED QUANTITY. 
AN EXAMPLE 


In calculating the geographical latitude of the Moscow State University Observatory on 
the Presna, Schweitzer’s observations on nine stars were used, Each star was observed 
several times, Therefore, the mean square errors, shown in the third column of the 
following table, were used: 


PR(xp-X 
+033 
~0.2] 
+0.84. 
+0.38 
+0.04. 
—0.04 
—0.50 
—~0.85 


-0.03 


6 
iS) 
0 
7 
] 
2 
9 
2 


WOUNDA CUP 


OF 
0. 
1. 
0. 
1. 
0. 
0. 
1, 
0. 


on 
—) 


234 Mathematical Analysis of Observations 


The first column gives the number of observations, Since the latitude is determined 
from observations of the different stars, in practice this column often gives not the 
number of observations, but the stars that were used, In the present example, observa- 
tions were made on five stars from the constellation Draco, one star from Cygnus, and 
three stars from Ursa Major, 

Column II gives the numbers x, = 9% — oe Here, 9, = 09°45°19”; that is in practice, 
just as in the example of Chapter 13, a new origin is chosen in such a way that ali the 
XxX, are positive, 

Column III gives the mean square errors of the values obtained by the method de- 
scribed in the preceding chapter, 

Columns IV and V contain a calculation of the weights from the mean square errors 
by use of the formula 


in which a preliminary value of 0,20" was taken for s,. The ratios 39 3, are calculated 
first (column IV), The squares of these numbers are given by column V, Column VI, 
giving the values pxx,, is necessary for calculating the weighted mean, that is, for 
calculating the most probable value of the Iatitude, Therefore, the sums of the numbers 
appearing in columns V and VI are written in the fast fine, The values in columns VII, 
VIII, and IX are explained by the column headings, We need only remember that the 
weighted mean is denoted by x without subscript for simplification in writing, 

Column X is necessary for acheck of the weighted sum of the squares of the deviations 
from the weighted mean, This check can be made by the easily derived formula 


n a) 
D>) Pr(xp— x)= DY py — px*, (15, 19) 
kel k=1 


where p is defined by equation (15,8), 

Column VIII is necessary for a check of the calculation of the deviations from the 
weighted mean and their multiplication by the weights, The check consists in determining 
the sum of the numbers in column VIII, If the division is precise in the calculation of the 
weighted mean, the sum will, because of the property of a weighted mean, have to be 
exactly equal to 0, Of course, such cases happen rarely since the weighted mean is only 
approximately calculated (usually with rounding off), Therefore, the iimiting error of 
the sum is equal to the product of one half the unit of the Iast digit in the number x and 
the sum of the weights, provided the sum is obtained on a calculating machine by accumu- 
lation, (In this check, the numbers x; and p;can be formally considered exact,) If the 
sum of the numbers in coiumn VIII is Iess than this {imiting error, we may assume that 
there were no errors in this part of the work, Otherwise, it will be necessary to find the 
error, 

The procedure described above is rather detailed, The person performing an experi- 
ment can shorten it somewhat as follows, First, to calculate the weights, he may calculate 


the numbers 5; and then obtain the weights by dividing s° by the numbers obtained, 


Second, if he does not intend to make the check of the weighted sum of the squares of the 
deviations from the weighted mean, column VI can be excluded and the sum of the num- 
bers in this column can be obtained on a calculating machine by the method of accumula- 
tion, Third, column IX may be omitted and the sum obtained again by the method of 
accumulation, 


A portion of the calculations is performed outside the scheme: determination of the 
weighted mean: 


— 90.32% — 
x= 553 = U.77"%, gp == 50°45'19" +. 0.77" = 35°45’ 19.77"; 


calculation of the mean square error per unit weight: 


calculation of the mean square error of the weighted mean: 


Analysis of Unequally Precise Measurements 235 


6 197, 


_ =o  —-—C 


° V693 2,63 


Calculations for checking: 

The absolute value of the sum of the numbers in column VIII is equal to 0,4”, To 
show the admissability of such a deviation from 0, we must calculate the limiting error 
of this sum, taking into account the way in which it was derived (see Section 6), The 
limiting error is computed here in a manner somewhat more complicated than that 
explained above, In addition to the error resulting from the fact that the weighted mean 
is determined only approximately, we need to consider the error resulting from dis- 
carding the digits in the individual terms, 

Since the numbers p; and +; can be considered as exact, the limiting error of each 
of the terms added must be taken equal to one half the unit of the last digit, that is, 
0.005, Therefore, the total value found for the error of the sum is 


9-0.5-107% + 0.5- 1077. 6.93 = 8-107”. 


The value of the sum is only one half its limiting error, Therefore, there is no reason 
for supposing that there are errors in the calculations, 

A check of the weighted sum of the squares of the deviations from the mean by using 
the formula given earlier yields 


n 
x2 = 0.59, px? =4.1, ( » Pk (Xk -3) = 6.1 — 4.1 = 2.0. 
k=] 


The discrepancy between the value of the sum (in the parentheses) obtained in the check 
and the value found in the original calculations is insignificant, Therefore, we may 
assume that there are no errors here, 

The result may be written 


¢ = 55°45/1977” + 0.19” (mean square error), 


Since the hundredth parts of a second are not sufficiently reliable, we may write the 
result as 


= 55°45'19 8” + 0.2”. 


Chapter 16 


DETERMINATION OF SEVERAL 
UNKNOWNS IN EQUATIONS BY 
THE METHOD OF LEAST SQUARES 


78. CONDITIONAL AND NORMAL EQUATIONS. 
LEGENDRE'S PRINCIPLE 


In astronomical work and in a number of other applied disciplines, 
problems are frequently encountered in which the quantities to be 
determined are not observed directly. Instead of the quantities 
themselves, certain other quantities, which are functions of the 
unknowns, can be determined from observations. * 


Example 1, Suppose that observations yield values x; and yx for the quantities x 
and y, Let us suppose that these quantities are related as follows: 


ya t bx + cx, 


where », 4, and ¢ are coefficients that are to be determined, Each observation gives 
an equation with three unknowns 


Y, = at bx, + Cx}, (k= 1, 2, ces n), 


where n is the number of measurements, 

Example 2, In examining certain instruments, the prohlem arises of determining the 
errors in the individual components of the instrument in question, Knowledge of the 
errors in the components makes it possible to calculate the error inherent in the entire 
instrument under different conditions, Direct measurements of the errors in the com- 
ponents would require taking the instrument apart, and making measurements with each 
component separately, Such a course is not alwaysconvenient and it may not give reliable 
results because in reassembling the instrument errors in some of the components 
(for example, the sizes of gaps) may change, For the sake of clarity, let us consider an 
instrument for measuring angles, Let us proceed as follows, First we measure several 
angles of different magnitudes with a much more precise instrument than the one that we 
are testing, so that the result may be considered virtually exact in comparison with the 
measurements of the instrument that we are testing, Measurements of the angles taken 
with the instrument that we are testing will yield values different from those treated as 
exact values, The differences can be considered as exact errors of the instrument for 


*Thus, direct measurement of the desired quantities is replaced by measurement of 
other quantities; that is, the unknown quantities are determined by means of other 
quantities, Therefore, it is sometimes sald that indirect or intermediary measurements 
are made, This terminology, however, is somewhat unfortunate, 


236 


Determination of Several Unknowns 237 


the different values of the angle, Knowing the construction of the instrument, we can 
think of its total error as a function of the errors of the components, Applying this 
functional relationship for the various values of the angle, we obtain equations of the form 


Ax = SK (th, Aa, Ad, re 


where 1x, is the error in the angle, \.:, 14, .. are the errors in the components of the 
instruments, and /;, is the functional relationship, The subscript # shows that, under 
certain conditions, the form of the functional relationship may depend on the quantity 
measured, 

Various examples of problems similar tothis are encountered in astrometry, celestial 
mechanics, and other divisions of astronomy, 


In its general form, the problem is as follows. If w, 6, c, ... 
are the quantities that we wish to find, we first obtain instead the 
quantities /,, which represent particular functions of the unknowns 
a,b, c,... . Each observation gives a conditional equation of the 
form 


f(a, Oo, a. j=l, (16.1) 


where & is the number of the particular observation in question, It 
is assumed that the f, are differentiable functions of the argument 
ad, 6, c, .... Generally speaking, these functions also contain param- 
eters, which vary from observation to observation. 

If there were no random errors in the observations or if they 
were so small that we might neglect them, it would, in the general 
case, be sufficient to make just as many observations as there 
are unknowns, set up the equations, and solve them. Only with 
special combinations of values of the /,. could it happen that some 
of the equations would be consequences of the others, so that the 
number of equations would have to be increased. 

But in problems of the usual type, the numbers /, contain 
random errors large enough so that they cannot be neglected. If 
we set up Only as many independent equations as we have unknowns, 
these errors will show upin full force in the solutions to the system. 
In order to hope for a partial cancellation of the errors, we need 
to make considerably more observations and, consequently, set up 
considerably more equations of the form (16.1) than we have 
unknowns.* Then, we have a system of equations of the form 


fy (a. 6, Oy...) =, (k -=1, 2,..., a): (16.2) 


where n is much greater than the number of unknowns—which, 
from an algebraic point of view, is somewhat peculiar. Equations 
of such a system are called conditional or initial. ** 


*The problem of the accumulation of errors in such systems and of their effect on 
the overall error is very complicated and has not been studied in the case of non- 
linear equations, 

**We note that there isconfusionin the terminology used in this connection, According 
to a remark of M, I, Idel’son, such equations are said to be conditionai ‘‘in the astronomi- 
cal sense,’? The problem consists in the fact that in geodesy the term ‘‘conditional 
equations’’ is applied to exact equations giving the connection between the unknowns, for 
example, equations like the one stating that the sum of the angles of a triangle is 180°, 
(The sum of the measurements of the angles usually differs from 180° by a small amount,) 
In geodesy, the equations of which we are speaking in the present section are called 
initial, However, we shall refer to them as ‘‘conditional’’ equations, 


238 Mathematical Analysis of Observations 


The peculiarity about the system of conditional equations (16,2) 
consists in the fact that, because of the random errors that appear 
in them, the system is incompatible, evenifthe functional relation- 
ships are exact (which is not always the case). This means that 
values a, 6, c,... Which satisfy all the equations of the system 
simultaneously do not exist. In other words, for any values 
a’, b’, c’,... that may be substituted into the system, 


ACT ee ee ES) 


Because of the incompatibility of the system of conditional 
equations, we must make some agreement as to the method of 
solution and we must clarify the probabilistic meaning of the 
method chosen. Obviously, we would naturally choose a convention 
according to which the absolute values of the discrepancies will be 
as small as possible, but this cannot always be done, If there are 
as many equations as there are unknowns, we can make all the 
discrepancies equal to 0. In the usual case, however, the problem 
of minimizing the absolute values of the discrepancies is indeter- 
minate. For example, we may pick out as many equations as we 
have unknowns and solve them. The discrepancies inthese equations 
will be equal to 0, but we will be able to say nothing about the 
discrepancies in the remaining equations, which may be quite 
large, Obviously, we may speak only of certain over-all conditions 
to be imposed on the discrepancies. We mention two such condi- 
tions, 

It would be quite natural to require that the sum of the absolute 
values of the discrepancies be minimized. This condition was pro- 
posed by Edgeworth. However, solution of conditional equations by 
Edgeworth’s method has not attained widespread use. 

Another convention had already been proposed by Legendre 
and published by him. After the publication of Legendre’s work, 
Gauss stated that he had used such a convention over a period of 
several years; however, it is naturally referred to as Legendre’s 
principle. 

Legendre’s principle. If a system of equally precise conditional 
equations is given, let us agree to seek unknowns such that the 
sum of the squares of the discrepancies will be minimized. 

This condition was soon accepted for completely understandable 
reasons, Although it is impossible to ensure that the individual 
discrepancies will be small, minimization ofthe sum of the squares 
does ensure that the individual errors are bounded. The sum of 
the squares of the discrepancies is an analytic function of the 
unknowns, provided, of course, the functions appearing in the 
conditional equations are analytic. Therefore, it is always possible 
to set up equations for determining the values of the unknowns that 
correspond to a minimum. Furthermore, as will be shown in the 
following section, Legendre’s principle has a simple probabilistic 
interpretation. 

Following Legendre’s principle, let us form the sum of the 
squares of the discrepancies 


Determination of Several Unknowns 239 


= Mi bc. JAP (16.4) 
Kk=1 


for the case of equally precise measurements, and let us write 
the necessary conditions for a minimum of this sum: 


—~=... 0, (16.5) 


These equations for a, 4, c,... are called normal equations, We 
can always obtain a system of equations containing as many equa- 
tions as there are unknowns to be determined. Therefore, the 
problem is well defined in the general case. 

If the conditional equations are nonlinear, they may have 
several systems of values of the unknowns, andit is then necessary 
to choose a system that will give the ‘‘minimum minimorum’’ to the 
sum of the squares of the discrepancies. The existence of such a 
minimum is ensured by the fact that S is a positive-definite 
function of the unknowns that can vanish only when all the dis- 
crepancies are equal to 0. This is impossible if there are random 
errors in the numbers /,. Therefore, S(a, ), c,...) must have a 
minimum. 


79. THE PROBABILISTIC MEANING OF LEGENDRE'S 
PRINCIPLE 


Let us suppose that random errors are contained only in the results 
of the measurements /, The functions f, may also contain the 
results of measurements in the form of parameters, but we shall 
assume that they do not contain random errors. This is legitimate 
if the errors in the parameters are small in comparison with the 
errors in the values for the J, Let us assume also, as in the 
preceding chapters, that the measurements of the /, are mutually 
independent and that the errors in the numbers /, obey a normal 
law of distribution. Under these conditions, the discrepancies 


Se Sy(a. b Ce Dh (16.6) 


represent exact errors, which obey the same normal law, by virtue 
of the condition of equal precision. The variances in the successive 
measurements are the same and the expectations of the errors are 
equal to 0. 

Let us assume for the moment that a, 6, c,... are known and 
that the errors 4, have been calculated. The probability of obtaining 
errors that are close in magnitude to the errors made when the 
measurements were taken is given by 


Py, (0% <3 <3, + Ad) = wee AS, (16.7) 


240 Mathematical Analysis of Observations 


where AS is a small positive number (see Chapter 11). Letus 
agree to write the left side of this equation briefly as P,(~ ®;). 


Since a, h, c,... are unknown, we cannot obtain the numbers 6, 

We can only make various hypotheses concerning the quantities 

a, b, c,..., find the corresponding values of 4,, and calculate the 

probabilities. To bring out the dependence of 4 and P, on 
a, b, c,..., let us write equation (16.7) in the form 

32 
se 6.8 
Py(~ Palas dC = aare (16.8) 


This probability can also be treated as the probability of obtaining 
the numbers /, when the measurements are made, since ¢, 18 ex- 
pressedintermsof /,andthe nonrandom variables — f,(a, 0, c,...). 
Therefore, the expression for the probability is finally written in 
the form 


Pi. (~,|a, b,c, ...)=—eee (16.9) 


where 3 is treated as a knownquantityand 4, is defined by equation 


(16.6). 
The probability of obtaining a set of results of measurements 
close to that obtained is a function of a, 6, c,... and is therefore 


obtained from the theorem on multiplying probabilities of mutually 
independent events (we are considering the measurements inde- 
pendent): 


n =" 6.10 
P(~ ly, loo 00s Ip las b, 6, =(SR5) exp — (16.10) 


If we make various hypotheses on the magnitudes of the numbers 
a, b,c,..., Wwe shall obtain various probabilities of the actual 
results of the measurements. 

Let us suppose that all the hypotheses regarding the quantities 
(a, b, c,...) are equally probable in some region (which can be 
made arbitrarily small) in the neighborhood of the unknown exact 
values. Then, we may use the corollary to Bayes’ formula accord- 
ing to which the a posteriori probabilities of the hypotheses after 
the event has occurred are proportional to the conditional proba- 
bilities of the events under the corresponding hypotheses. Accord- 
ing to this corollary, 


S32 
= 6.11) 
Pa, Ob, Cc, ... lds, l,, ..., b,)=Kexp — 4-1 |}, (1 ° 


Determination of Several Unknowns 24) 


where kK is the product of a proportionality constant and a factor 
that does not depend on a, b,c¢,.... Thus, the probability of the 
hypotheses concerning the values a, b, c,...is a function of these 
values, Therefore, the question arises determining those values 
a, 6, c, ... at which the probability has a maximum. Since a, b,c, ... 
appear only in the exponential, the hypothesis regarding a, b, c, ... 


will be most probable at those values at which ))3; is smallest, 
k=1 


This is simply Legendre’s principle. Thus, we come to the follow- 
ing conclusion: 

Ifa system of equally precise conditional equations 1s given, 
the most probable values of the unknowns are obtained when the 
sum of the squares of the discrepancies is smallest, 


80. GENERALIZATION OF LEGENDRE’S PRINCIPLE 
TO UNEQUALLY PRECISE CONDITIONAL EQUATIONS. 
REDUCTION OF UNEQUALLY PRECISE TO EQUALLY 
PRECISE EQUATIONS 


Suppose that for some reason the results of one’s measurements 
cannot be considered equally precise. Let us suppose that the 
mean square errors 4, 6, ..., ¢, are known for these measure- 
ments, and that the weights p,, p. ....p, are known, We can again 
calculate the probability of obtaining approximately the number 
lL, on taking the &th measurement, under a definite hypothesis 
regarding the values a, 6, ¢,...: 


2 
Py (By <b <b + Ad) = — Aa, (16.12) 
a, VW 2n 


where, according to (16.6), 
Oy == f(a, b, C, wee). 


We introduce the weights 


where «, is the mean square error per unit weight, In the paren- 
thesized expression in (16.12) we write /, instead of 6, since 6, 
is a function of J, for given values of a, 6, c,.... Using the abbre- 
viated notation for the inequalities in the parenthesized expression 
in (16.12), we obtain 


Dk Prey 
P(~1,|a, 8, ¢, ..)= te exp (— : je. 
0 


242 Mathematical Analysis of Observations 


As in the preceding section, we may write the probability of the 
set of results of the measurements as a function 


P (mys lay veer Un [Os Os Cs oe = ; 
Si opy,d% 
n kvk 
—( Ab y" Il Vin exp | — ae } 
ay V 2n hod 26%, 


If we again apply the corollary to Bayes’ formula under the assump- 
tion of equal probability of the possible sets (a, 5, c), we obtain 


a 13 
P (a, b, el, lo, eo 8. L,) == M exp _ *=! ’ (16. ) 


‘2 
29, 


where M represents all of the factors not depending on a, 8, ¢, 
that appear in the preceding equation, and also the proportionality 
constant in the corollary to Bayes’ formula. 

It follows from this that the most probable set of values of 
a, b, c, are obtained if we require that the sum 


be minimized. Thus, we come to a generalization of Legendre’s 
principle, which we can formulate as follows: 

If the conditional equations are unequally precise, we should 
find values of the unknowns that will minimize the sum of the 
products of the squares of the discrepancies and the weights 
(the weighted sum of the squares of the discrepancies). 

If the equations are equally precise, all the weights can be 
considered equal to unity, and we have Legendre’s principle in its 
simple form. 

From the generalization of Legendre’s principle, it is easy to 
obtain a rule for reducing non-equally-precise conditional equa- 
tions to equally precise ones. We write the weighted sum of the 
squares of the discrepancies as follows: 


We use the notation ¢«,==5, Vp, Then, 


n 
2 
= y) Ek. 
kel 


This means that the weighted sum of the squares of the discrepan- 
cies can be considered the sum of the squares of the discrepancies 


Determination of Several Unknowns 243 


for a set of equally precise conditional equations that have dis~ 
crepancies «,. The given equations for the discrepancies 


0, =f, (a, 0, c)—|, 


can easily be converted into equations with discrepancies «, if we 
consider the relationship between 6, and «,. Thus, we have the 
following rule: 

To reduce unequally precise conditional equations to equally 
precise ones, we should multiply each conditional equation by the 
square root of its weight. 

The normal equations in the case of nonequally precise con- 
ditional equations are obtained by minimizing the function of 
several arguments: 


Wp  OSp Sy 
Oa—‘i‘idwé‘””:~<CS 


81. THE REDUCTION OF NONLINEAR CONDITIONAL 
EQUATIONS TO LINEAR FORM 


It is possible to set up normal equations no matter what the form 
of the conditional equations, that is, for arbitrary functions 
f,(a, 6, c). The solution of normal equations inthe case of nonlinear 
conditional equations, on the other hand, can present great diffi- 
culties. What is still more important is the fact that here the 
roots of the normal equations will not be linear functions of the 
random numbers /,. The significance of this fact is due to the 
following considerations. The most probable values of the roots 
of the normal equations are obviously functions of the random 
variables /, and, consequently, are themselves random. To deter- 
mine the possible oscillations of the most probable values of the 
unknowns, we need to know at least the variances of the unknowns. 
Note that in speaking of the variances of the unknowns, we must 
consider not only the chosen set of numbers /, obtained in the 
series of conditional equations in question but also all possible 
series of values of j, that can be obtained due to other combina- 
tions of reasons that cause random errors. To each such series 
will correspond a set of most probable values a,d,c,.... This 
family of sets constitutes the region of possible values of a, b,c, .... 
We may set E(a) =a, E(b)=6, E(c)=c, .... and determine the vari- 


ances of the random variables a, }, c,...; this will give an idea of 
the reliability of the calculated values of a, 8, c, ... obtained from 
the set of measurements, oe 

It is this necessity of determining the variances of a, 6, c,... 
that compel us to consider the matter of the linearity of the condi- 
tional equations. In this case, not only is the solution of the normal 
equations simplified, but so is the problem of determining the 
variances of the wnknowns. This is true because linear conditional 
equations give linear normal equations. Their roots will be near 


244 Mathematical Analysis of Observations 


functions of the numbers /,, in other words, linear functions of the 
random errors. If we may consider these errors mutually inde- 
pendent, we have a simple problem of determining the variance of 
a linear function of the independent variables. This is why the 
method of least squares is developed only for linear conditional 
equations. 

In the case in which we are given nonlinear conditional equations 
(for example, in the setting up of an empirical formula that is 
nonlinear with respect to the parameters), they must be reduced 
to linear form.* We can show two methods of reducing the equations 
to linear form. 

The first method, which is quite simple, consists in making a 
change of variables that, after sultable manipulations, will render 
the conditional equations linear with respect to the new unknowns. 


Example 1, Suppose that the conditional equations are of the form 
a, sin (a + 6) +- By sin (@ — 8) + yxe- 2° = Ly, k=1,2,..., 1, 


where ax, 8x, 1k, and 4, are given numbers and a, 8, and c are unknowns, Then, by 
making the substitution 


sin(a +- 6) = x, sin(a—b)=y, e-2C = 2 
we obtain the linear equations 


an + Bey +- 1h? — ly = 0. 


After solving this system, we obtain the most probable values of the unknowns x, y, and 
z. These random values are linear functions of the random variables /; and, because of 
the postulate regarding a normal law for the random errors, we may assert that x, y, 
and z are also normally distributed, It follows from the formulas relating a, 6, and c 
to x, vy, and z that 


(arcsin x -+- arcsin y), 


5 (arcsin x — arcsin y), 


Since the distribution laws of x, y, and z are known (normal), we may find the distribu- 
tion laws of a, 5, and ¢, Consequently, we may also find their variances, which com- 
pletely solves the problem in the probabilistic sense, 


However, this method is by no means always applicable. 

A general method, which is suitable for all conditional equations, 
rests on the assumption that the discrepancies in the conditional 
equations are small, for which we need to assume that the coeffi- 
cients of the conditional equations are exact and that the random 
errors in the numbers /, are small (in absolute value). In other 
words, the incompatibility of the system of conditional equations 
is ‘‘weak,”’ 

Let us now find preliminary approximate values ofthe unknowns 
by some method and let us denote them by ap, 4), and c, We can 


—_——e 


*This operation is sometimes called /inearization of the equations, 


Determination of Several Unknowns 245 


demonstrate two methods that can be used for choosing the original 
approximation. The first consists in choosing from the given 
System as many conditional equations as there are unknowns, 
taking care to see that the equations chosen are approximately 
evenly distributed over the system. The solution of the chosen 
equations ylelds the numbers 4a, 4,.and co The second method is 
applicable to those problems whose solutions are obtained gradu- 
ally as the amount of observational data increases. In astronomical 
work, such a problem is the determination of the fundamental 
constants (the parallax of the sun, the precessional constant, etc.). 
Initial research of this type gives values of the unknowns that can 
be taken as the original values when new research is done. 
Let us set 


a=O+x, b=b+y, coz... (16.14) 


We now substitute these expressions into the conditional equations, 
expand the functions f, in series of powers of x,, y,, z,, ..., and 
keep only the first order terms of these corrections. 

We then obtain linear conditional equations of the form 


fig (Qo. Bor Cov +++) —4+(4E), “1 +(3#), * +(SE) at -++==0, (6.19) 


where the subscript zero by the partial derivatives indicates that, 
after the differentiation, we should take a—da), etc. Solution of the 
system of linear conditional equations by a method to be shown 
below yields the probable values x,, y,, and z,and the mean square 


errors oz, o-, oz, We then make the approximation 


Qj A+ x, bby, C= Co+Zy,--- 


with the above mean square errors, Since a, b,, ¢, ... are fixed 
(nonrandom) numbers, We proceed with this approximation just as 
with the original one; that is, we setup a linear system of the form 
of Of Of _ 
fi (Q, b,, C1) — Lh +(+-) Xo +(55), Yo +(3) Zz) + ... =O, 
where the subscript 1 by the partial derivative indicates that after 
the differentiation we seta =<a,, etc. Solution of the system ylelds 
values x, yg, z,... and mean square errors oz, o-, or, ..., Then, 


Yo? 2a 
A, A,+%,, b,= 2:4 Cg = Cy 2y +e: 
or 
Ay = Ay tx, %., b,2= bt M+ Cg 2+ 2p... 
In this case, the mean square errors are found from the theorem 
on the variance of a sum: 


2 2 
oq, = V% +- o- 


(and similar formulas for 6,, ¢., ...). 


246 Mathematical Analysis of Observations 


For every given system, the question arises as to the conver- 
gence of this process of successive approximations. Since formal 
investigation of the convergence is rather complicated, what one 
usually does is see whether the approximations are in fact con- 
verging. All the calculations are made with a finite number of 
digits after the decimal point (or with a finite number of significant 
figures). If the same value is found for two approximations ina 
row (with the degree of accuracy that is being used), it is assumed 
that the process of successive approximations is concluded and the 
values a, 6b, c that are obtained are considered final. 

This method of linearization is rather tedious. Therefore, it 
is used only rarely, e.g., only in very important problems, If the 
method of substitution cannot be applied ina problem, investigators 
usually prefer to replace the approximately conditional equations 
with others to which the first method is applicable. 

Example 2, Suppose that the table on the right gives the 
values of an approximately periodic function, Determine the f w 


parameters a, 6,c, and P of the law of harmonic oscillation 
approximating it: 


; 0.00 2.02 
nt 0.5 .86 
w =asin(* +8) +e, 1.04 1:49 
1.56 1.02 

| 2.08 0.51 
assuming that there are random errors in the values given for 2.60 | + 0.14 
w and that the measurements are equally precise, 3.12 | —0.03 
If we substitute the values given for ¢, and wy, into the 3.64 | +015 
above equation, we obtain a system of 13 nonlinear conditional 4.16 0.53 
equations: 4,68 0.97 
5.20 1.46 

Onty 5,72 1.90 

a sin( P + b) +e = wy, k= 0, 1, 2, sey 12, 6.24 1.98 

14.00 


From the table and the graph constructed on the basis of it, we 
may take the following initial approximate values for the unknowns: 

Po = 6,30 (since, for ¢ = 6,24, the initial value of (2.02) has not yet been re-attained); 

Co = 1,08 and a) = 0,90 (since the maximum yalue is close to 2 and among the values 
there is one close to 9, so that a and c are of the same order: for the value of Co, we 
take the arithmetic mean of all the w); 

by = 1,50, which may be obtained by a crude comparison of the graph with a sine 
curve and a determination of the displacement, 

tus set 


Asa rk b=bo+¥p C= Cy+2, P=Pytuy. 


By linearization, we obtain the following system of linear conditional equations: 


Qnty 


5~ Uy + 


= 


0 
+ 21 + (4 sin t% + ¢yp — wy) = 0 (k=0, 1, 2,..., 12), 


Sin ty.X1 + Ay COS TY, — Aq COS ty 


where 


Qnty 
B, + by. 


V;= 


We rewrite this system in the form 


anX, + bays + cyz, + dyu, + ly =0 (kR=1, 2,..., 12, 


Determination of Several Unknowns 247 


where 
ay = sin tz, by = Ag COS Tx, 
2Qnt 
Ch=l, dp = — Ay COS TR k 
Pp? 


0 
ly = Aq sin te + Co — Wy. 


In the following sections, a method (based on Legendre’s principle) will be examined 
for solving such a system of conditional equations, and the values of xy, y;, 21, and uw, 
in this example will be determined, Here, we only note the results of the determination 
of the first approximation: 


x,;= + 0,0973; y,=+ 0.0875, 2, = — 0.0749; ua, = — 0.0354. 


The initial values of the unknowns and the result of the first approximation are shown 
in the first two columns of the table: 


ag = 0.90 a, = 0.9973 (a)=1 
by5 = 1.50 | 6,= 15875 | (6)= > = 1.5708 


Co = 1.08 °7 = 1.0051 (c) = 1 
Py = 6.30 P, = 6.2646 (P) = 6.2832 = 22 


We note that an artificial device was used for constructing this example, The exact 
values of w were calculated for the values of ¢ with step = to get the value of the 


parameters (a), (6), (c), and (P) appearing in the third column of this table, These 
values were increased or decreased in a random manner by 1, 2, or 3 hundredths, 
The values of ¢ are given in radians with accuracy up to one half of a hundredth, 
Values of the tabulated function w= w/(f), were obtained; the approximative function 
was constructed for these values, The results of the calculations show that the first 
approximation already gives values of the parameters close to the original values, 


82. LINEAR CONDITIONAL AND NORMAL EQUATIONS 


Suppose that we have the system of linear conditional equations* 
axtbhyteztdutlh=0 (k=1,2,..., 2) (16.16) 


where x, y, z, and , are unknowns and a, ob, c,, d, and J, are 
numbers that vary from equation to equation, and where n is the 
number of conditional equations. It should be remembered that we 
are assuming that only the numbers /, contain random errors. The 
numbers a,, b,, c, and d, may be approximate, but, by hypothesis, 
they do not contain random errors. We shall call the numbers /, the 
free terms and we shall call a,, 0,, c,, and d, the coefficients. 

The discrepancies 6, of the equations are found from the 
relations 


6, = a,x +b, y+ c,2 + d,u+ l, (R=1, 2, ..., 7), (16.17) 


*For brevity in writing, we shall confine ourselves to the case of four unknowns in all 
future calculations in this chapter, 


248 Mathematical Analysis of Observations 


in which x, y, z, and w refer to as yet arbitrary numbers, There 
is a sort of contradiction between the systems of equations (16.16) 
and (16.17). In connection with this, we should remember that the 
system of given conditional equations is incompatible and therefore 
the system (16.16) has only a formal meaning. The sum of the 
squares of the discrepancies (16.17) is of the form 


s= 3 = Serta tee dt h (16.18) 
k=1 =] 


Assuming the conditional equations equally precise, we use 
Legendre’s principle to write the necessary conditions for a 
minimum: 


m | 
—— 2S) (ape + OnE cuz + dytt + hy) a, = 0, 


rf) 
= 2 Y) (aye Ou cuz + yt +) by = 0, 
1 


(16.19) 
PS — 2S) (a,x + OY He ey2 + yt +h) & = 0, 


k=l 


3 n 
SS = 2 Vi (aye + Oey + cuz + dy + ly) dy = 0. 


kewl 
These conditions lead to the normal equations: 


x Dae + yD dy +z Dy aye, + a Day dy + Di ly = 0. 

x > dyay ty Oe +z Dopey tt Dede + Di bide = 0, (16.20) 
7 ~ 1 3 . 1 ° 

XC de + Y D+? Deck a Di cxdy + > city = 0, 

x Yaya, + y Duby t- 2 Didyex ta Didi + Day = 0 | 


(the summations being taken from k£=1to k=n), 

The normal equations are set up analogously for an arbitrary 
number of unknowns, 

The question of the sufficiency of these conditions for a minimum 
of a function of several arguments canbe investigated by the familiar 
means of analysis, but there is no need to do this. The sum of the 
squares of the discrepancies S isa quadratic forminthe arguments 
x, y,z, and u. Therefore, it can have only one extremum. This 
form is positive-definite. Therefore, it must have a minimum and 
the roots of the normal equations determine that minimum, 

In astronomy and geodesy, Gauss’ notationis almost always used 
in writing the normal equations: 


n 


> a3, = [aa], D a,b, = lad], ..., > A,l, = [al] (16.21) 
Kat ket kel 


Determination of Several Unknowns 249 


and so forth. In this notation, the system of normal equations with 
four unknowns becomes 


[aa] x +-[ab] y+ [ac] z+ [ad] u + fal] =0, ) 

[ba] x + [05] y+ [bc] z+ [ba] u + [bl] = 0, 1.6.22) 
[ca] x + [cb] y + [cc] z+[cd] a + [el] = 0, 

[da] x +[dd) y +-[de] z+ [dd] u + [dl] =0. 


We then have the obvious equations 
[ba] = [ad], [ca] =[ac] etc, 


If unequally precise conditional linear equations and their 
weights p,.are given, we must apply the general rule for reducing 
unequally precise equations to equally precise ones. Each con- 
ditional equation should be multiplied by the square root of its 
weight. The transformed system of conditional equations will take 
the form 


0 V pax + dy V pay + cy V paz + dy V ryt + hy V dy = 0. 


If we apply Legendre’s principle to this system and set up the 
normal equations, we obtain 


[paa}] x + [pad] y+ [pac] z+ [pad] u+ [pal] =0, 
[pba] x +- [pbb] y + [pbc] z+ [pbd] u +-[pbl] = 0, (16.23) 
[pea] x +-[pcb} y+ [pec] z + [ped] u+ [pel] = 0, 
[pda] x +- [pdb] y +-[pdc] z +-[pdd] u + [pdl] = 0, 


where 


n n 
[ paa] =2 Pay; =[pab] = pa Jy, ete, (16, 24) 
- =I 


The coefficient matrix for the unknowns ofthe system of normal 
equations possesses the following two properties: (1) It is sym- 
metric about the principal diagonal; (2) all the elements of the 
principal diagonal are positive numbers, The second property is a 
result of the fact that all the elements are the sums of the squares 
of the coefficients of successive unknowns in the conditional equa- 
tions (or the sums of the products formed when the squares of the 
coefficients are multiplied by the weights, which are positive 
numbers), Such a sum can vanish only if each addend is equal to 0. 
But this would mean that the conditional equations would contain no 
terms at all with the corresponding unknowns, Therefore, we do not 
need to consider this case. 

If we switch terms or even equations in the set of normal equa- 
tions, we shall only get permutations, which conserve these two 
matric properties. 


250 Mathematical Analysis of Observations 


We shall not examine the question of the existence of a solution 
to the system of normal equations for the general case, We note 
only that solutions may fail to exist in exceptional cases. To show 
this, let us consider the system of conditional equations with two 
unknowns: 


A,X + On y +-l, = 0, 


The determinant of the system of normal equations can be trans- 
formed by use of Lagrange’s identity: 


n n n 2 nm mn 

~| 2 2 

ax > (Baws) = D> D Gbn—ambdr. 
k=1 Kel kel kelm=1 


From this it is clear that the determinant will vanish if =e 


for all k and m, thatis if the ratio of the coefficients is constant, 


If i are equal to the same constant, the problem is indeterminate. 


If the numbers /, are arbitrary, Legendre’s condition cannot be 
satisfied by finite values of the unknowns, since we have a system 
of equations with two unknowns in which the terms containing the 
unknowns are the same, while the free terms are different. 

In the general case of an arbitrary number of unknowns, the 
relationships between the coefficients of the unknowns that will 
result in the vanishing of the determinant of the system of normal 
equations are more complicated. In problems of the usual type, in 
which these coefficients are arbitrary, the determinant does not 
vanish and the problem has a unique solution. 


We note without proof that a necessary and sufficient condition for the determinant of 
the system of normal equations not to vanish is the maximality of the rank of the matrix 
of the system of conditional equations (that is, equality of this rank with the number of 
unknowns), Also, we note that, in practice, it is easier to evaluate the determinant of the 
normal system than it is to see whether this condition is satisfied or not, 


Setting up the linear system of normal equations is a tedious 
process. If the number of conditional equationsis « andthe number 


nam (m+ 3) 
2 


of unknowns is m, we need to perform multiplications 


with two factors appearing in each and mar) additions with a 


addends in each. For example, if m=4 and n=—10, we would need 
to perform 140 multiplications and 14 additions. Although this 
work is elementary, the tediousness of it could dull the attention of 
the person making the calculations so that he would tend to make 
errors. If is great, it would be sensible to use a mechanical 
computer, The simplest mechanization would be calculation of the 
coefficients in the normal equations on a calculating machine by the 
method of accumulation. (The first product of two factors is left on 
the register of the calculating machine, to it is added the following 
product, etc.). 


Determination of Several Unknowns 25) 


83. A CHECK ON THE SETTING UP OF THE 
NORMAL EQUATIONS 


In both mechanical and pencil~and-paper calculations, it is neces-~ 
sary to check whether the normal equations are set up properly. 
This check is ordinarily made in the following simple manner, 
In each conditional equation, we form the sum s, ofthe coefficients 
of all the unknowns and of the free term. In other words, s, is the 
sum of the elements of the kth row of the augmented matrix of the 
system of equations. The values found for the numbers s, must be 
checked, To do this, we calculate by columns the sums of all the 
elements in the augmented matrix; that is, we find the sums of the 
coefficients of all the unknowns and the sums of the free terms, 
At the same time, we calculate the sumof all the numbers s,,. Using 
Gauss’ notation, we denote the last sum by [s] and the column 
sums of the elements by [a], [5]. [c]. [dJ. [J]. 

We have the obvious checking equation 


(Is]) = [4] +14] + Ie] +12] +14], 


where the parentheses denote the checking number, Ifthe additions 
referred to are performed formally exactly, the control equation 
must be exactly satisfied. It is easy to make such a check ona 
tabulator if the number of cOnditional equations is great. After 
calculating and checking the numbers s,, we form the control sum 
of the products 


[as], [ds], [es], [ds]. 


The number of these products is equal to the number of unknowns. 
It is easy to see that we get a check of the normal equations from 
the following equations (in a problem with four unknowns): 


[aa] + [ab] + [ac] + fad] + [al] = [as], 
[ba] + [56] + [bc] 4- [bd] + [0l] = [ds], 
[ca] -++ [cb] + [ec] + [ed] +- [cl] = [cs], 
[da] -+- [dd] -+- [dc] + [dd] + [dl] = [ds]. | 


(16.25) 


The left sides of these equations represent the sums of the coeffi- 
cients of the unknowns and the free terms of the consecutive 
normal equations. 

Let us agree to number the unknowns in the order in which they 
are written in the conditional equations and touse the same number- 
ing for the consecutive normal equations. Then, the checking rule 
can be stated as follows. 

To check the mth conditional equation, we must take the sum 
of the products of the numbers s, and the coefficients with sub- 
script z of the mth unknown, This sum should be equal to the sum 


252 Mathematical Analysis of Observations 


of the coefficients of the unknowns and the free terms of the 
conditional equation that is being checked. 

To ensure a reliable check, it is convenient, when performing 
the multiplications, to ignore at first the fact that the coefficients 
are approximate numbers and to take the coefficients of the normal 
equations with all digits that are formally obtained. Then, the 
control equations must be exactly satisfied. It is expedient to 
make the check in the same way for all the normal equations. 
This makes it easier to find the errors in case there is a dis- 
crepancy in the check. For example, if, in the case of four un- 
knowns, there is a discrepancy inthe check only in the third normal 
equation, it will be necessary to check in turn the coefficients 
[cc], [cl], and the control sum. The reason is obvious: the remaining 
coefficients in this equation appear in other equations in which the 
control does not indicate an error and which may therefore be 
considered reliable. 

For the check, we need in total: » additions with m+! addends 
(the numbers s,) in each, m+2 additions with n addends in each 
(for checking the s,), nm multiplications with two factors in each, 
m additions with 7 addends in each, and 7 additions with m-—-1 
addends in each. If m=4 and n=10, we have the following total 
amount of computation: 


10 additions of 5addends, 
6 additions of 10 addends, 
40 multiplications of 2 factors, 
4 additions of these products, 
10 additions of 5 addends for 
checking the normal equations. 


Example, In example 2 of Section 81, a system of 13 linear conditional equations with 
four unknowns xy, y;, 2;, and uw, was obtained in literal form by means of linearization, 
If we substitute the values taken for the quantities do, 09, O, Po: 44, fy, ..., tyo for the 
coefficients az, 5;, cy, d,, and /,, we can write this system in the following schematic 
form; 


0,998 0.061 1 0,000 | — (0.042) 2.017 
0.897 | — 0.398 1 ().033 0.027 1.559 
0.566 | —0.742 1 0.122! 0.099 1,045 
0.081  —0.897 1 0.222 0.133 0.539 

- 0416 —0.819 1 0,269 V.196 0,230 
-Q812 — 0,525 1 0216 0.209 0.088 
~ 0.995 -- 0.093 i 0.046 0.2135 0.173 
~~ (915 0.385 1 | —a210 0107 0.347 
- (591 ().725 1 — 0.478 0.018 (G74 
— 0.113 O.N95 1 | 0.603 0008 1,197 
0.386 O.B31 | 1 0.683 , — 0.033) 1.501 
0.793 0.547 | 1 | —0.495 | —0.106 |} 1.739 
0.991 0.120 | 1 —O118 | —0.008 | 1.985 

0.370 + 0.070 +13 —1.739 + 0.823} 13.024 


| (13.024) 


Determination of Several Unknowns 253 


The column headings indicate those unknowns whose coefficients are given in the columns, 
The free terms are given in the column labeled £, The column labeled S, giving the 
control sums, is obtained by adding the numbers in the different columns (in the same 
row), They are checked by means of the last row, which gives the sums of the coef- 
ficients of the numbers appearing in each column, The number in parentheses in the 
last column is the control sum 13.024, which is obtained by adding the sums of the first 
five columns, Since it is exactly equal to [Ss], we may assume that there are no errors 
in the column S (provided that no errors were made in this column that canceled each 
other out when the addition was made), 

The sums of the products of the coefficients of the unknowns and the free terms in 
the conditional equations (which yield the coefficlents of the normal equations) are 
easily calculated mechanically without writing down the individual products, This con 
siderably reduces the amount of computational labor in comparison with the old methods, 
whereby each product was calculated using a table of logarithms, The control sums for 
all the normal equations are calculated at the same time as the coefficfents of the 
normal equations, For a complete check, all the digits that are formally obtained when 
the muldplications are performed are written down, The results are written in the form 
of a matrix with column heads denoting the unknowns, the free terms /, and the control 
sums, 


x, | yy | 2; | uy | Lb | Ss 


6.909156 0.080654 0.870000 | — 0.439975 | — 0.630230 6.789605 
0.080654 4.936378 0.070000 | — 2.509864 | — 0.523411 2.053757 
0.870000 0.070000 13.000000 | — 1.739000} + 0.823000 13.024000 
— 0.439975 | — 2.509864 | — 1.739000 1,623981 | +- 0.189828 | — 2.875030 


l ( 
ee! 


The last column is a check, It enables one to check the coefficients of the normal 
equations by comparing the number of the last column with the sum of the other numbers 
in the same row, Since the multiplications are formally computed exactly, each 
Ss; (for j=1, 2. 3, 4) must be exactly equal to the next sum, If this turns out not to be the 
case, there must be an error either in the coefficients in the normal equations or in the 
control sum, It should be noted that errors of the latter kind are frequently encountered, 
This is explained by the tediousness of the calculations in making the check at the end 
of the entire process and by a drop in attentiveness on the part of the person making 
the calculations, 


In almost all problems, the coefficients and free terms in the 
conditional equations are approximate numbers in which only the 
digits written down are correct (and even these are not always 
correct). Frequently, the last digit is unreliable in the sense that 
it may differ by several units from the one that would be obtained 
in a more exact calculation. Therefore, by no means all the digits 
that are written in the last matrix are correct. As we know from 
Part I, the number of certain digits in the product of two numbers 
cannot be greater than the smaller number of certain digits in the 
original factors. When the products of the pairs are added, the 
limiting absolute errors of the addends are added, and it is now 
not the number of reliable digits but rather the number of digits 
after the decimal point in the addends that has an effect on the 
error in the result. Therefore, the question of evaluating the 
error in the coefficients of the normal equations is somewhat 
tedious to solve. On the other hand, we know that the errors in- 
curred in rounding off (which is what we should deal with in the 
present case) can be either positive or negative and, in the majority 
of operations, they may partially offset each other. Therefore, 
estimates of the errors based on the maximum possible error are 


254 Mathematical Analysis of Observations 


almost always exaggerated, and it is advisable to leave more 
digits in the coefficients of the normal equations than would be 
done if we were trying to find the maximum error. 

In the example that we have been examining, the elements of 
the matrix of the conditional equations are given with three digits 
to the right of the decimal point. Any system can be reduced to a 
similar form by an obvious substitution of the unknowns, For 
example, if in the system 


a,x +b,y +l, =0, k=1, 2,..., 2 


the numbers a, are given with three digits to the right of the 
decimal and the 4, are given with two, we only need to introduce 
a new unknown 7=10y in order to have the coefficients of » have 
three digits to the right of the decimal also. In such cases, we 
often assume that the coefficients of the normal equations cannot 
have a greater number of certain digits to the right of the decimal 
than have the original numbers. Accordingly, the coefficients 
obtained for the normal equations are rounded off. 


In the present example, after the numbers that we have obtained are rounded off to 
three digits to the right of the decimal, we have the following diagrammatically written 
system of normal equations; 


x, | y; | Zz, | Ly | L | 8 


6.909 0.081 0.870 | —0.440 | — 0,630 6.790 
0.081 4.936 0.070 ; —2,510 | —0.523 2.054 
0.870 0,070 13.000 | — 1.7939 0.823 13.024 
— 0.440 | — 2.510 | — 1.739 1.124 0.190 | — 2.875 


The column S is included because these numbers will afterwards be used to check the 
solution of the system of normal equations, The result of rounding the number S off 
can differ from the sum of the rounded coefficients of the normal equations by one or 
two units of the last digit, This is true because the absolute value of the error in S that 
results from rounding off does not exceed one half the unit of the third digit to the right 
of the decimal in this example, This is true for each coefficient, Since we have five of 
them, the limiting error of the sum of the coefficients is equal to 2.5- 10-3, the limiting 
error of S is equal to 0.3-10-%, and the limiting error of the difference is equal to 
3-10-38, This calculation emphasizes the possibility that we mentioned earlier of there 
being a discrepancy between S and the sum of the coefficients, The use of S to check 
the solution of the normal equations is possible only if S and the sum of the coefficients 
are exactly equal, Therefore, we must ensure this equality by changing either a number 
in the column § or the coefficients in the normal equations, The latter procedure is 
more accurate since the error in S is less, as can be seen from the calculations that we 
have just performed, However, this course of actlonis more indefinite because we do not 
know which coefficient or coefficients should be changed, Therefore, itis S that is 
usually changed, 


84. THE SOLUTION OF A SYSTEM OF LINEAR 
NORMAL EQUATIONS 


If we need to find only the values of the unknowns, we may solve 
the system of normal equations by any of the familiar methods of 


Determination of Several Unknowns 255 


solving systems of linear equations. In particular, we may use the 
method of iteration if the number of unknowns is great, 

However, it was noted in Section 78 that we need to apply the 
method of least squares in natural-science problems for the 
reason that the numbers /, contain random errors. The roots of 
the normal equations, which we shall consider as approximate 
values of the unknowns, are functions of the random variables |, 
and hence are themselves random. To determine their possible 
variance, we need to find the mean square errors of the unknowns. 
When we are confronted with such a problem, we usually use only 
two methods: (1) The method of determinants and (2) the method 
of successive elimination of unknowns, usually called Gauss’ 
method, * 

The method of solving systems of linear equations by deter- 
minants is generally known and, therefore, we shall not stop to 
explain it. We only note that direct evaluation of determinants of 
order higher than four is an extremely laborious process. There- 
fore, systems of higher order than four are usually not solved by 
means of determinants. 

The method of successive elimination of the unknowns is often 
used even with systems containing three or four unknowns and is 
always used when the number is greater than four. 

This method is quite simple. We rewrite the first normal 
equation so as to express the first unknown in terms of the remain- 
ing unknowns and we call the equation in this form the first elimi- 
nation equation,** In the case ofa system containing four unknowns, 
it will be of the form 


[ab] [ac] [ad] __ {al (16.26) 


[aa] faa] ~~ [aa] ~~ [aay 


This equation is schematically written as follows: In the col- 
umn labeled y in the system of normal equations, we write the 


number — aE in the following column, we write the number 
— ee and so on to the end. We note (for the moment without 


explanation) that we must perform the same operations with the 
numbers in the column S as with the numbers in the column L, 
In the present case, we must write the number — lel in the 
column S. Here, [as}is the highest number in the column §, If 
we substitute this expression for x in all the other equations, we 
obtain an intermediary system with one fewer unknown. We write 
the coefficients of the intermediary system in the form of the 


following matrix: 


*This name is somewhatill-chosen, The essentials of this methodof solving systems 
of linear equations was apparently known to the Arabs, The part that Gauss played was 
not in discovering the method but in showing that in using it one could, at the same time, 
determine the weights of the unknowns, 

**It 1s often advisable to arrange the equations and the unknowns in such a way that 
the coefficient of the first unknown is the greatest of the diagonal elements, 


256 Mathematical Analysis of Observations 


y Zz u L S 
[O51] [bc 1) [od 1) [d21] [os 1) 
[cb 1] [ccl]  [cedl} = [cll] [esl] 
(dbl) [del] [dd!} dll)  [ds}] 


If we actually make the substitution of the first elimination equa- 
tion, we obtain, after combining the coefficients of like unknowns, 
the following expressions for the coefficients: 


[bb 1) = [6] -+{ — (ba]}, 1 

[bc] = [be] +} — 7 (ba}}. 

[bd 1] ==(6a] +4 — a (ba}}, (16.27) 
(ol) = (61) + { — 1 foa}}, 

(ON = eb) + | — tgay Fea) | 


etc, 

There 1s a mnemonic for checking all these formulas. The 
first term on the right contains the same letters as the term on 
the left without the 1. The second term on the right must be such 
that after we perform a (formal!) cancellation we get the same 
term again (with a minus sign). After all formal cancellations and 
collection of ‘‘like’’ terms, the right side must be equal to 0. 

Let us represent the coefficient matrix of the normal equations 
with the control column S by dots. We write the elimination row 
under this matrix and denote its elements by hollow circles: 


x y zu te § 
¢ oe oe ee #@e @ 


ee is] eee 


e\e e @ e © 


Sol 0 0 0 
x *e & &* ¥ 
x |X| * * X 
* ke e¥ * & 


The matrix of the intermediary system with three unknowns is 
denoted by asterisks, We can give a simple diagrammatic rule for 
forming the numbers represented by the asterisks. The first row 


Determination of Several Unknowns 257 


of the original matrix is discarded: it has been used for obtaining 
the row with the hollow circles (the elimination row). To obtain the 
mth row in the asterisk matrix, we must add to each number (other 
than the first) of the mth shortened row the product of the number 
in the elimination row appearing in the same column and the first 
number of the point row taken. In the diagram, the original num- 
bers and those obtained from them are shown ina square box, 
Such an operation must be performed in all columns (except the 
first in the orlginal matrix) and in all rows (except the first in the 
original matrix). This rule is sometimes called the right-triangle 
rule, which can be explained by the drawing in our conventional 
diagram. The number at the right angle of the triangle must be 
added to the product of the numbers at the ends of the hypotenuse. 

The system obtained by eliminating the first unknown possesses 
the two properties of a normal system referred to above: the 
matrix of the coefficients of the unknown is symmetric about the 
principal diagonal and the numbers along the principal diagonal 
are all positive. However, it is advisable to calculate the mutually 
symmetric elements ((0cl] and [cdl], etc.). Since they are not 
obtained by exactly the same operations, a discrepancy between 
them indicates the presence of an error only if the difference is 
greater than two or three units of the last digit. When the numbers 
are rounded off, a difference of one or two units in the last digitis 
possible. In such a case, we have to change one or both of them in 
such a way that there will be exact equality. 

The accuracy in setting up the elimination rowis checked in the 
column S if we perform the same operations in it as in the other 
columns, The number inthe column S$ must be equal to the sum of 
all the other numbers of this same elimination row minus l, If, 
because of variation in rounding off the numbers, there is a differ- 
ence of one or two units in the last digits (or sometimes even more 
than this if there are many unknowns), we need to change the num- 
ber inthe column S of the elimination row insuch a way that there 
will be exact equality in the check, This ensures reliability of the 
check in subsequent calculations. 

The first intermediary system obtained after elimination of the 
first unknown can also be checked by the numbers in the column S, 
It is easy to show that each of the numbers in this column should 
be equal to the sum of the remaining numbers in the same row. 
Again, a discrepancy in the units of the last digit is possible and 
must be removed by changing the number in the column 5S. 

After the first unknown is eliminated, we have a system of 
normal equations with one fewer unknown than in the original 
system. The second unknown is eliminated in the same manner 
as the first unknown was eliminated from the first system, and 
the checks are made in the same manner, The elimination equation 
is of the form* 


*Before eliminating the second unknown, it is sometimes advisable to change the 
order of the equations and the unknowns in such a way that the coefficient of the first 
unknown in the intermediary system will be the greatest number in the principal diagonal, 


S 


258 Mathematical Analysis of Observations 


ya — (eel), fall , — (ony (16,28) 


Only the coefficients are written in the diagram in their respective 
positions. The substitution of this expression for y into all the 
equations in the first intermediary system (other than the first 
equation of this system) will give a system with two fewer unknowns 
than the original system. From a system with four unknowns, we 
obtain a system of the form 


z ul L S 
[cco2} [cd2} [cl2} [ces2] 
{dce2} [dd2) [al2} [ds2}. 


The coefficients are calculated from formulas similar to formulas 
(16.27) for a system with three unknowns: 


[ec2| = [eel] +4 — Var [cb 1}}., 
[cd2} = [cd] +{— a [cb 1] 


\ 
[cl2} = [cll] +{— ren Up 


> 


(16.29) 


[de2\ = [acl} + {— Pet tao} | 


etc. Here again, the calculations are made according to the right- 
triangle rule. We must calculate [cs2} and [ds2} to check the new 
intermediary system. We also must check the elimination row 
immediately after it is formed) by the formula 


11 +B) +B) +H) 08.90 


that is, the sum of the numbers in the elimination row (other than 
S) must be 1 less than the number obtained in the elimination row 
in the column S, 

The equations of the second intermediary system are checked 
in exactly the same manner as the equations of the first inter- 
mediary system, and the equations of the original system are 
checked by means of the numbers in the column S, 

We then get from the first equation of the second intermediary 
system the following elimination equation: 


fect)” Tee 2y" (16.31) 


In the column ‘‘u,’’ we write the coefficients of «, In the column L, 
we write the free term. In the column S, we write the number 
_ ese) 
[eel] 


equation, we obtain an equation of the form 


When the expression for z is substituted into the next 


Determination of Several Unknowns 259 
[dd3) u + [d/3] —0 (16.32) 


with one unknown, namely u. Diagrammatically, it is writtenin the 
a and L columns as follows: 


ul L 
[dd3] [adl3]; 


The control number {ds3} is written in the column S, The coeffi- 
cients are calculated from the formulas 


\dd3| = [ad] +{— a [dc] , 
[a/3] = a2) +4 — i [acQ} }, (16.33) 
[cs2] 


[ds3] == [ds] +{— a} [dc2| }. 
The check of equation (16.32) is made from the formula 
[dd3] + [di3] = [ds3}. 
From equation (16.32) for u, we obtain the last elimination equation 


u=— ty (16.34) 


Here, « denotes the root of the system of normal equations, that 
is, the most probable value of the unknown uz. Substituting the value 
that we have found for u into the elimination equation (16.31), 
we obtain 


[ced2] — [el] 
(ccd) © [eed] (16.35) 


Substitution of u and z into the elimination equation (16.28) yields 


—~ [bel] = [bd] — — [aL] 
Y=—~ Tée1; 7 — Toor] * — [oor] ° (16.36) 


Finally, the first elimination equation (16.26) yields 


xa — el 5 Lael Fla) fal (16.37) 


[aa] [aa] 7 [aa] [aa| ° 


The values of the unknowns are checked on the basis of the 
following simple consideration, Suppose that we make the following 
change of variables in the normal equations: 


x=:+l, 
y=a+l, 
z= f+4-1, 


u==v+ Il. 


260 Mathematical Analysis of Observations 


The first normal equation will be of the form 
{aa} § +- [a5] y + [ac] 5 + [ad] v+- 
+- {{aa] + [ab] + [ac] + [ad] +-[al]} = 0. 


In the transformed equation, the term without an unknown is equal 
to [as], in agreement with the control formula for the first normal 
equation. The other equations are transformed analogously. From 
this, it follows that if the column L is replaced with S in the 
normal equations, we obtain a system of equations for determining 
&, y, ¢, and »v that is, the unknowns that are exactly 1 less than 
the unknowns x, y, z, and uw. When the same operations are carried 
out in the column S as in the column L, the system with the un- 
knowns {, 7, &, and v is in effect solved. Therefore, in solving the 
system of normal equations, the same operations are performed 
simultaneously with the column S and with the column L, which 
gives x, y, z, and «4, The column S yields the values ;, 7, °, and 
¥, which are used as a check, If there are no errors, they must be 
exactly 1 less than x, y, z, and 4, Ordinarily, the presence of 
errors in rounding off will make x and §, etc. differ by slightly 
more or slightly less than 1. The amount by which this difference 
may deviate from 1 increases with the number of unknowns, 

We note the following fact, which is important for calculating 
the mean errors of the unknowns. In solving a system of normal 
equations by Gauss’ method, one should perform only those arith- 
metic operations that are necessary for that purpose. One should 
not multiply or divide the original or the intermediary equations by 
any numbers. 


Methods for solving a system of normal equations by the method of successive elimi- 
nation of the unknowns other than the one that we have been examining are used, For 
example, the so called Gauss=Doolittle method is widely used in geodesy, Here, the 
amount of writing is considerably less than in the above method, Only the elimination 
rows and one (the first) column in each intermediary system are written out (see, for 
example, RomanovsHi, Matematicheskaya statistika) A convenient procedure for 
arranging the work in this case has been given by P, T, Reznikovskii, $5, G, Makover 
developed a matrix method of solving systems of normal equations that leads to a com- 
putational procedure analogous to the Gauss-Doolittle procedure, 

Scull other methods, besides those of successive elimination and determinants, are 
applicable to a system of normal equations, A detailed exposition can be found in the 
book by V. N, Faddeeva, However, not all of these methods are applicable if we are 
required to find not only the values of the unknowns but also their weights, 


85. CALCULATION OF THE WEIGHTS OF THE UNKNOWNS 


As was mentioned in Section 81, the most probable values of the 
unknowns, which we can obtain from a system of linear normal 
equations, are linear functions of the random variables J/,,1,,..., l,. 
To obtain these functions, we write the solution of the system of 
normal equations in terms of the determinants of the augmented 
and coefficient matrices (in the case ofa system of four unknowns): 
D,, 


x ==>, y= ——— 


. D,, 
D ) 


D,  — 
Dp”? i _— Dp (16.38) 


Determination of Several Unknowns 261 
where D is the determinant of the system 


[aa] [ab] [ac] [ad] 

__ | (6a) [66} [bc [ba) 

~ | fea] feb) fec) fed} |’ (16.39) 
[da] [db] [dc] [da]! 


and D, is a determinant of the form 


[al] [ab] [ac} {ad} 
_ | (4) (bb) [bey [bal 
| fel] [eb] [ec) fed) | (16.40) 


[df] [db} [de] [da) | 


we) 


D,. D,, and D, have analogous expressions except that the column 
of free terms in the normal equations is, successively, in the 
second, third, and fourth position. 

If we expand the terms in the first column of the determinant 
D,, we obtain 


a, [ab] [ac] [ad] 

n b, [bb] [bc] [ba} 
ka1 | Cy ([cb} [ec] [ed] 
d, [db] [dc] [dd] 


(16.41) 


From (16.38) and (16.41), we obtain the desired linear expression 
for x in terms of the numbers |,: 


x=— Yh (16.42) 
k=1 


where 


a, [ab\) [ac] [ad] 

b, [bb] [bc] [bd] 
= . 6.43 
MT oy [cb] ec} fed] G6.e) 


d, [db] [dc] [da] 


Let us now find the variance of the unknown quantity x, treating 
the numbers J, as random. From the theorem on the variance of a 
linear function of mutually independent random variables /,, we 


obtain 


on = ons (16,44) 


262 Mathematical Analysis of Observations 

where 3} are the variances of the successive conditional equations, 
We shall assume the conditional equations to be equally precise 
(or reducible to equally precise equations). Therefore, 


o§—o3 k=l, 2,..., A, (16.45) 


where so, is the mean square error of an equation of unit weight. 
Thus, for equally precise conditional equations, 


2 
oA, (16.46) 


nm 
where A= > Ai,,and A, 1s determined from formula (16,43). To get 


an expression for «=, we must find the quantity A. We use (16.43) 
to rewrite A in the form 


ja, [ab] fac] (ad) 

n % |b [bb] [bel [ba] 
A= YX AA,= MA 

wo el eo) feel [ed] 


dy [db] [de] [da] 


Using the rule for multiplying a determinant by a real number and 
the rule for adding determinants, we obtain 


a a,A, [ab] [ac] [ad] 
2 bid [bd] [bc] [ba] 
A=\ (16.47) 
a CyA, [cb] [ec] [ed] ! 
Daa [db] [db] [da] | 


We know from algebra thata determinant with two identical columns 
is equal to zero. Therefore, 


PLE [ab] [ac] [ad] 


n Doi [BD] [be] [ba 
D 5A, =| =0 
ke 1 

21 ule eb] [ec] [ed 


p2 byd, [db] [de] [dd] 


Determination of Several Unknowns 263 


and analogously, 


pa CA, = 0, 
» Ap = 0 
Furthermore, we have 
nm 
7a, (ab) [ab} [ad] 
k=] 
nT 
1 > 0,0, (0b) [be] [dd] 
4 k=1 
d=), = D. 
k=l 
= a,c, [cb] [ec] [ed] 


m, 
ul 


Teh [db] [de] [dd} 


~ 
tl 


ih : 
a 


If we substitute the sums that we have found into the determinant 
4, and expand it in elements of its first column, we obtain 


A=D-Dh,, (16.48) 
where D,, is the cofactor of the first element in the diagonal of the 
determinant D of the system of normal equations, 


If we substitute (16.48) into the formula (16.46), we finally obtain 


of = St 0. (16.49) 


From the definition of weights, we have 


and analogously, (16.50) 


where D,, D3, and D,,are the cofactors of successive elements in 
the principal diagonal of the determinant of the system of normal 
equations. 

Thus, we may formulate the following rule for finding the weights 
by means of determinants, 

The weight of each unknown is equal to the determinant of the 
system of normal equations divided by the cofactor of that element 
of the diagonal that is the coefficient of the unknown in question. 


264 Mathematical Analysis of Observations 


This is a good rule to use if the system of normal equations 
itself is solved by means of determinants, which, as we stated 
earlier, is expedient only for systems with two or three unknowns, 

Suppose now that the system of normal equations is solved by 
Gauss’ method, We shall show that in this case the rule given for 
calculating the weights is applicable not only tothe original normal 
system, but also to any of the intermediary systems.* 

On the determinant D of the system of normal equations, we 
perform transformations analogous to those that we madein setting 
up the first intermediary system. We divide the first row of this 
determinant by {aa], and we then subtract this row from rows II, 
iI, and IV, multiplying it respectively by [ba], [ca], and [da]. We 
then obtain 


[ab], [ac], {ad 
" [aa]? [aa)’ [aa] 
D=f{aa}|9; [061] [acl] [ddl] |, (16.51) 
0; {cdl} [ccl] [cd 1] 
0; [dbl] [del] {ddl} 
that is, 
D ={aa)- D™, (16.52) 


where D" denotes the determinant of the first intermediary system. 
Analogous transformations of the cofactors of the diagonal 
elements of the determinant D yield 


Dy, = (aa) DS, Dgs = [aa] DS}, Dy, = faa} DY), 


where D{}, DS}, and Di) are the cofactors of the elements (bbl), 


[ccl], and [ddl] of the determinant D”, From these formulas and 
(16.50), we find 


Dp”) po) po 
p-= p= p-= (16.53) 


Thus, the assertion made is proven for the first intermediary 
system. In just the same manner, it is proven for the remaining 
intermediary systems, We now use this fact to prove the following 
theorem of Gauss: 

In solving a system of normal equations by the Gauss method, 
the coefficient of the last unknown in the last intermediary equation 
(which contains no other unknown) is equal to its weight, provided 


only those operations that are necessary for this method have 
been performed, 


“Proof of this fact and the proof to be given below of Gauss’ th 
P, T, Reznikovskti, é Ss" theorem are due to 


Determination of Several Unknowns 265 


Proof: Let us use the rule that we have formulated to obtain 
the weights of the next-to-the-last and the last unknowns that appear 
in the next-to-the-last intermediary system: 


| [cc2] [ed2] [ec2] [cd2] 
__ j [ae2] [dd2] _ __ | [ae2] [da] 
en C2) a (7) 


From the formula given in the preceding section for the coefficient 
{dd3] of the last intermediary equation, it follows that 


[cc2] [cd2] 


ae? 'da2) |= [dd3}[cez]. 
Therefore, 
_ __ [dd] [ec2] 
Pa" Tadd)’ (16.54) 


This proves Gauss’ theorem. 

Thus, in solving a system of normal equations by Gauss’ method, 
the weight of the last unknown is obtained more or less automati- 
cally: it is equal to the coefficient, which must still be calculated. 

The weight of the next-to-the-last unknown is easily obtained 
from the first of formulas (16.54). It is equal to the product of 
the weight of the last unknown and the ratio of the elements of the 
principal diagonal in the next-to-the-last Gauss system consisting 
of the last two equations (with two unknowns). As a rule, the weights 
of the last and the next-to-the-last unknowns are determined, and 
then the order of the unknowns in the equations is changed in such 
a way that the next two unknowns, counting from the end, are last, 
Here, the equations must be rearranged so that the matrix of the 
coefficients of the unknowns will be symmetric, with positive ele- 
ments on the principal diagonal, In other words, the unknowns and 
equations should be rearranged in such a way that the system has 
the properties of a system of normal equations, 

Solution of the transformed system ylelds the weights of two 
more unknowns. This gives us another check of the solution, 
since we may obtain all the unknownsa second time. If we are sure 
of the results of the first solution, we may confine ourselves to 
setting up the intermediary systems, since this is enough for cal- 
culating the weights. When we have three or four unknowns, we 
must solve the system twice, when we have five or six unknowns, 
we must solve it three times, etc. 

We can show that the weight of the third-to-the-last unknown is 
given by the formula* 


*We might write analogous formulas for the weights of the fourth, fifth, etc, unknowns 
from the end, but these formulas contain, respectively, third-, fourth-, etc, order deter- 
minants, and therefore, they are not suitable, 


266 Mathematical Analysis of Observations 


[654] 
po = [dd3] eet Ted] * (16.55) 
[del] [ddl] 


If, in addition to formulas (16.54), we apply this formula, we must 
solve a system of normal equations with three unknowns once, a 
system of four, five, or six unknowns twice, etc. 


The method that we have been expounding for determining the weights rests heavily on 
the Gauss procedure for solving systems of normal equations (wherein we write out all the 
intermediary systems), In geodesic Hterature, this method is called Enke’s method, If 
the number of unknowns in the system of normal equations is greater than six, to deter- 
mine the weights by Enke’s method, we must solve the system of normal equations no 
fewer than three dmes (four times, in fact, if we use only formulas (16,54)), Therefore, 
for systems with many unknowns, we need to resort to other methods of calculating the 
weights, 

In a method developed by N, I, Idel’son, in addition to the given system of normal 
equations with m unknowns, we solve (simultaneously) m more systems with the same 
determinant and the following sequences of free terms (Il, 0, 0,...), (0, -1, 0, ...), 
(0, 0, -l, ..), ee » It is easy to show that in each of these systems, one of the unknowns 
is equal to the reciprocal of one of the weights of the unknowns in the system of normal 
equations, The amount of extra calculation necessary to determine the weights by this 
method is approximately equal to the amount used in the basic calculations involved in 
solving a system of normal equations by the method of successive elimination of the 
unknowns, Therefore, this method is definitely more suitable than Enke’s method if the 
number of unknowns is greater than six, 

It follows from (16,53) that the reciprocals of the weights are equal to the diagonal 
elements of the matrix inverse to the matrix of the system of normal equations, There- 
fore, if we apply to the normal equations any of the methods of solution of a system of 
linear algebraic equations that are based on finding the inverse matrix, we shall obtain 
not only the values of the unknowns, but also their weights, The matrix exposition of 
Idel’son’s method given by Makover is based on this fact, 


86. THE APPROXIMATE VALUE OF THE MEAN SQUARE 
ERROR PER UNIT WEIGHT. THE MEAN SQUARE 
ERROR OF THE UNKNOWNS 


After solving the normal equations, we obtain the most probable 
values of the unknowns «x, y, z, and wu. If we substitute these values 
into the conditional equations, we obtain discrepancies satisfying 
the condition that the sum of the squares be minimized. These 
discrepancies are often called residual errors, reserving the more 
general name discrepancies for the results obtained when arbitrary 
numbers x, vy, z, and w are substituted. To show that the numbers 
found for x, y, z, and w satisfy the conditional equations, we must 
calculate the residual error of each conditional equation. Such a 
calculation presents no difficulties when one is using a calculating 
machine, since we need to find the sum of products, which can be 
done without writing down the individual addends. We denote the 
residualerrors* by :, (for ==1, 2, ..., 2) and we shall refer tothem 


*Knowledge of all the «, is essential because sometimes there are gross errors in 
the conditional equations, Such equations must be discarded and the calculations made 
again, Conditional equations with gross errors are detected by excessively high residual 


reece comparison with the great majority of the s,, Usually, the three-sigma rule 
s use 


Determination of Several Unknowns 267 


as the remainders, Let us now determine the sum of the squares 
of the remainders, that is, the minimum of the sum of the squares 
of the discrepancies, We denote this minimum value by s. In 
the case of four unknowns, 


” 
ei = 2 (axe —+ by LL Cyz a yt l,)°. (16.56) 


The quantity s can easily be calculated directly from the =«,. since 
the individual «, must be calculated anyway. A rather simple for- 
mula exists for calculating s. A simple calculation yields 


zi 
$= Dx (aie + Obey + ayey2 + ayy + ayl,) + 
+ ey (b,0,.x +- Diy A by c,2 +- by dy + bala) +- 
2? 
f- = Z (cyayx + Cyd, Y + chZ + Cydgt + cyly) + 
y=] 
nN 
\ — — —- — 2— 
+ du (d,Qyx + dybyy 4- dycyz + dyu + dyly) + 
in 6 _ n _ nN _ n _. n 
+2 pt x Sak ty Ogle FZ Dy Cyl, bu Daal. 
= k=l kel kel kel 


Each term is squared, all the terms with doubled products are 
written in different rows in the form of the sum of two identical 
addends, and the terms are factored. In Gauss’ notation, the 
equation obtained can be written as 


$ = x ({aa] x +-[ab] y + [ac] z+ [ad] w +-[al]) + 
+ y([ba] x + [6] y+ [bc} z+ [od] a + [bd}) + 
+2 ([ea] x + [cb] y +-[ec] z+ [ed] a+ [cl]) + 
+ u([da] x + [db] y + [de] z+ [dd] u + [dl}) + 
+ [ll] ++ x [ad] + y [ol] 4-2 [cl] 4-u [al]. 


Since x, y, z, and uw are the roots of the normal equations, all the 
sums in the parentheses are equal to 0 and we are left with the 
simple expression 


$= x [al] + y [dl] 4-2 [cl] +4 [al] + [ll], (16.57) 


for which we need to perform only the extra calculation of the 
sum of the squares of the numbers /,. 

When we use this formula to calculate s, the errors resulting 
from rounding off will be considerably less than by direct calcula- 
tion of the sum of the squares of the remainders. Therefore, it 
should be considered the basic formula and a direct calculation 
of the sum of the squares of the errors should be considered as a 
check. There may be a considerable discrepancy between these 


268 Mathematical Analysis of Observations 


two values of s if » is great. For comparison, one should estimate 
the limiting error of the direct sum of the squares of the residual 
errors. Suppose, for example, that all the remainders are given 
with three reliable digits to the right of the decimal and that the 
number of conditional equations is 10. If the remainders are of the 
order of 0.05, the limiting error ofthe square of a single remainder 
is equal to 


9.0.05-0,5- 107° =0.5- 107”. 


Since there are 10 of them, the limiting error of the direct sum of 
the squares of the discrepancies is equal to 0.5-10~*. If we add to 
this the limiting error of the sum given by formula (16.57), we 
obtain an upper bound on the absolute value of the difference be- 
tween the two values of s. An excess over this value definitely 
indicates an error in one number or the other (sometimes both). 

After we have calculated the sum of the squares of the remain- 
ders, we can calculate the most probable value of the mean square 
error per unit weight s, by using the formula 


—— (16.58) 


D) 
0 n—m’ 


fe} 


where m is the number of unknowns. 

We give this formula without proof, since the proof is rather 
tedious. It is similar to the proof given for the analogous problem 
for equally precise and unequally precise measurements of a 
single quantity. These problems can be considered a special case 
of the problem discussed in the present chapter. 

After s is calculated, the mean square errors of the unknowns 
are determined from the familiar formulas, since the weights of 
the unknowns have already been found: 


4% 


—_ — a) 
Ve-' Vp etc. (16.59) 


é6- = 


8 


It is convenient to put the result in the following form: 


xx to, y=yto etc, 

If we assume a normal distribution of the random errors for 
the numbers /,, then x, y, ..., being linear functions of /,, are also 
normally distributed. Therefore, we may calculate probabilities 
of the form 


P([x—x|<o5)=0.68, P(ix—x| < 33) = 0.9973 


(and analogously for the other unknowns); that is, we may find 
bounds between which the true value lies with a given probability, 
Or we may solve the inverse problem. If we cannot use a normal 
law, analogous probabilities can be found from Chebyshev’s in- 
equalities. 


Determination of Several Unknowns 269 

The bounds obtained in this manner are correct if the number 
of observations is great (approximately twenty or more). If the 
number of observations is small, the results should be evaluated 
by using Student’s distribution (see V. I. Romanovskii’s book 
Osnovunye zadachi teorti oshibok (Basic Problems in the Theory of 
Errors)), 


87. AN EXAMPLE ILLUSTRATING THE PROCEDURE FOR 
SOLVING SYSTEMS OF LINEAR CONDITIONAL EQUATIONS 


Suppose that we have the following system of thirteen conditional equations with four 
unknowns: 


y Zz | u | l | RY | @ | ro 
-+ 0.998 | + 0.061 | + 1.000{ + 0,000} — 0.042] + 2.917 — 0.015] 0.000225 
+ 0.897 | —0.598 | +1 +. 0,033} +0,027| + 1,559 + 0,003 g 
+ 0,566 | —0.742 | +1 +.0.122] +0.099| + 1,045 + 0,010 100 
4- 0.081 | — 0.897 | +1 + 0,222} 4+ 0.133] + 0.539 — 0.020 400 
— 0.416 | —0.819) +1 + 0.269} + 0,196] + 0,230 + 0,000 0 
— 0,312 | —0.525 | +1 + 0.216} + 0.209} +0.088 -+ 0.001 
— 0.995 | — 0.093 | +1 + 0,046} +0215] +0,173 + 0,033 1089 
0.915 | +0365 | +1 — 0.210; +06.107] + 0.347 —0.018 324 
—0591 +0725 +1 — 0.478 +0.018| + 0.674 — 0,034 1156 
—0.113 , +0895 +1 — 0.663; +0008; +1,127 + 0.024 5765 
+ 0.386 | +0331 +1 | —0683| —0.033! +1501 +0027] 729 
+ 0,793} +0547 +1 | — 0495} —0.106] + 1.739 — 0.038 1444 
+- 0.991 , +0120 + 1,000 | — 0.118 — 0.998 + 1,985 + 0.028 0.090784 
| 
Sums | | | | | 
+ 0.870 {| + 0.070 ! +. 13.000 | — 1,739] + 0.823] + 13,024] — 0.091 | 0092837 | 


We set up the normal equations, calculating their coefficients by the method of accu- 
mulation, and we write them down in the same type of tabular form as the conditional 
equations: 


+. 6.999156 | 4+- 0.980654 | + 0.57000) | — 0433975 | — 0.630230 | + 6,789605 
+- 0.080654 | + 4.936378 | = 0.070002 | — 2.509864 | — 0.523411] + 2,053 757 
+ 0.870900 | + 0.070990 + 13.090900 | — 1.739009 | + 0,82.3009 | -+ 13,024000 
—- 0.439975 | — 2.509864 | — 1.739000 | + 1.623981 | +- 0.189828} — 2.875030 


If the control equations are exactly satisfied when we keep all the digits that we 
formally obtain, we can assume that no errors were made in setting up the normal 
equations, 

Now, we discard those digits in the normal equations that are not completely reliable 
and keep only three digits to the right of the decimal in each number, It should be noted 
that, according to the rules for estimating errors by means of limiting errors, the third 
digit to the right of the decimal may be uncertaln, However, we retain it because there 
are quite a few operations performed, and we may assume that with a probability close 
to unity that the actual errors in the coefficients will be considerably less than the 
limiting errors, 

Let us rewrite the system of normal equations, retaining three digits to the right of 
the decimal and let us write the entire schema for the solution: 


270 Mathematical Analysis of Observations 


es 


\ 
x | Vv 


I 
| 
6.909 0.081 0.870 — 0,440 — 0,630 6,790 
0.081 4.936 0,070 — 2,510 — 0,523 2,054 
| 0.870 | 0.070 13.000 | —1,739 | 0,823 13,024 
— 0.440 — 2,510 — 1,729 1.624 0,190 — 2,875 
x= +0.0973 | —0.0117 — 0.1259 | + 0.0637 | +-0.0912 —0,9827 (1) 
| 4,935 0,060 — 2.505 — 0.516 1,974 
: 0.060 12.890 | |—1,684 | 0,902 12,168 I ; 
| — 2.505 — 1,684 1.596 0.150 — 2,443 
, y= + 0.0875 —0,0122 4.0,5076 +-0.1046 — 0.4000 @) 
(7 =— 0.9125) 
12,889 — 1.654 0.908 ! 12.143 1 
| — 1.654 0 ,325 — 0.112 —1.441 . 
| —0,0704 | —0,9421 (3) _ 
z=x—0,0749 0,1283 
0.118 0.004 0.117 I! 
“== — 0.0354 (4) 
(v= — 1.0354) 


Exposition of the calcalating procedure: (1) denotes the elimination row of the un- 
known «©, Written in full, it would be 


x= —0,0117y —0.1259z +0.6372 -+ 0.0912. 


The coefficients are obtained by dividing the coefficients of the first normal equation by 
6,909 and changing the sign, 

I denotes the system of three equations with unknowns y, z, and uw, The coefficients 
of this system are obtained from the coefficients of the given system by the right~ 
triangle rule, For example, to obtain the number -1,684, we must add the product of the 
numbers 0,870 and 0,0637 at the ends of the hypotenuse to the number -1,739 at the right 
angle, (The numbers forming the triangle are shown in boxes in the diagram,) In every 
row, the left end of the hypotenuse is constant, namely, the first number of the row, 

(2) denotes the elimination row of the unknown y in system I, 

I] denotes the system with unknowns z and u, It is obtained from | and (2) by the 
right-triangle rule, 

(3) denotes the elimination row of the unknown z in system IL 

III denotes the equation with only one unknown, namely, u, It is obtained from Il 
and (3) by the same procedure, 

The first number in each elimination row is the value of the unknown in question, 
These numbers are obtained by the reverse process by means of the elimination rows, 
The numbers in parentheses under them are checks, They are obtained by the same 
reverse procedure, if we take numbers from the column § instead of the column L. in 
the elimination rows, In two places in the column S, the last digit is decreased by 
unity, This was done for the sake of agreement with the check (the removal of an ad- 
missible difference), 

The last number in the column labeled u, printed in italics, is the weight of the last 
unknown, derived by use of Gauss’ theorem 


p- = 0.113. 


We obtain the weight z if we multiply the weight uw by the ratio of the elements of the 
principal diagonal in system II: 


P> = 0,113 ‘ 0.325 = 4,480. 


Determination of Several Unknowns 271 


To obtain the weights of the remalning unknowns we again solve the system of normal 
equations reversing the orders of the unknowns and the equations: 


u | z y x L S 
— 1.739 13.000 0.070 0.870 0.823 13.024 (1) 
— 2.510 0.070 4,936 0.031 — 0.523 2.054 (0) 
— 0.440 0.870 0.081 6.909 — 0.630 6.790 (0) 
(u) = — 0.0421 + 1.0708 + 1.5456 + 0.2709 | —0.1170 + 1.7703 
— 2.618 1.057 — 0,599 — 0,229 — 2,339 
0,399 — 0,599 6.700 —~ 0,579 6,011 
(2) = — 0.0759 +0.2351 | — 0.0338 | —0,0921 | —0.8928 
— 0.442 — 0,505 40.012 — 0.051 
, — 0.505 6.776 — 0.616 5.655 
| (y) = + 0.0838 + 1.1425 — 0.0271 +0,1154 
| 6.199 — 0.602 5.597 
(x) = +.0.0971 


The calculations are carried out as before, As a check, we again calculate the values 
of the unknowns, although this is not absolutely necessary, Since all the roots are small, 
even noticeable differences in the roots in the second calculation should be considered 
admissible, For final values, we take the averages of the two calculations, A check of 
the first and second sets of roots in the normal equations shows that the two sets are 
equally acceptable, since the roots of both satisfy the normal equations with sufficient 
accuracy (the maximum deviation being 0,0005), 

We find the weights of the ‘“‘last’’ two unknowns: 


0.442 


= 6.776 ° 6,199 = 0.404. 


P= = 6.199. p 


S| 


We recall also that 
p> = 4.480. po = 0.113. 
The values that we take for the unknowns (the averages of the two solutions) are 
x=+0,0972; y=+0.0856: z= —0,0754: u = —0.0390 


We substitute these values in all the conditional equations to obtain the residual errors, 
The results should be written in a column labelled <« along with the matrix of the con- 
ditional equations, We then obtain directly 


2 
s= > ej = 0.026837. 
k=] 


Calculating the sum of the squares of the residual errors according to formula 
(16,57) derived in the preceding section, we obtain 


s’ = 0.07081, 


Here [//] = 9.182531, If we consider the accumulation of errors resulting from rounding 
off when the calculations are made directly and also when they are made by means of 
formula (16.57), we may consider the difference between s and s’ as admissible, The 
four significant figures that are written above are obtained formally, but they are not 
accurate, Therefore, we take 


s = 0.0070, 


that is, we take a number close to s’ on the basis of the observation with regard to the 
greater accuracy of s’ as compared with that of s, According to formula (16,58), the 
most probable value of the mean square error per unit weight is given by 


272 Mathematical Analysis of Observations 


» 090070 

50 = -- OU). ‘) oO, 

5 = gy = 0.000778 
3, = 0.028. 


Since the conditional equations are equally precise, this number is the mean square 
error of one conditional equation, According to formulas (16,59), we find the variances 


of the unknowns to be 


¥ 0.00078 B 0.09078 
= = ——___ = {), ( 33 -— s= — > (07,9020; 
G— 59 0.900138 or 0.40 0.0020 
» 0.00078 _ 2 __ 0,00978 
Therefore, 
s_ = 0.011, 3_ = 0.045, os = 02,013, a7 = 0,083. 
L u ™ 
We write the results in the form 
x. = 07.097 = 0,011, y = 0,086 + 0.045, 
2 == —-().075 = 0.013, u = — 0.039 + 0.083. 


These results show that the given system of conditional equations makes it possible 
to determine only the unknowns x and Z,with sufficient sureness, ‘Ihe value of y that 
we find is doubtful and the value found for uw is completely unreliable, For, if we take 
into account the normal law of distribution, we may use the three-sigma rule to write 


P(—0122 <u < +0,044) = 0.68, 


Consequently, even the sign of « is doubtful because we may expect with a probability 
of only 0.68 that uw will be positive, 


Chapter 17 


EMPIRICAL FORMULAS 


88. STATEMENT OF THE PROBLEM 


The most important problem in astronomy and in science in general 
is that of finding the principles governing natural phenomena. 
This is done by accumulating observational material and deducing 
from it information as to the laws that the phenomenon in question 
obeys. The solution of such a problem is exceedingly important, 
since if we know the laws governing phenomena, we can predict 
the pattern that the phenomena will follow in the future. For 
illustration, we need only cite one example from astronomy. The 
discovery of the laws governing the motions of heavenly bodies 
(the laws of Kepler and Newton) enables us to predict solar and 
lunar eclipses with an accuracy that is quite sufficient for practical 
purposes. In ancient times, when eclipses were considered cata- 
strophic events, priests of various cults who had learned empirically 
to predict eclipses used this ability to enhance their power. 

In what follows, we shall speak of numerical and functional 
regularities of behavior. 

Let us consider the following (relatively) simple problem. 
In connection with some phenomenon, we observe the values of 
two quantities ¢ and x; that is, from n observations we obtain the 
t, and x,(fork=1.2,...,n). Let us assume that x is some un- 
known function of ¢. We wish to find a function that approximately 
represents the relationship between ¢ and x. 

In a sense, the recording of the observations in this problem 
amounts to defining a tabular function, which we need to approxi- 
mate by some function. 

One important remark should be made in connection with this. 
In nature, there are no such simple relationships between only 
two quantities. Ordinarily every quantity depends on quite a number 
of other quantities. A problem of finding a relationship between 
two quantities can arise either when we assume that the influence 
on the quantity .« of the remaining arguments can be neglected, 
or when we assume that all these other arguments have at least 
approximately constant values during the entire period of obser- 
vation. 


T 273 


274 Mathematical Analysis of Observations 


The first assumption means that, taking into consideration the 
accuracy with which the observations were made, the errors 
resulting from disregarding the remaining arguments have only a 
slight effect on the value of the function. In the second case, the 
other arguments appear in the functional relationship as constant 
parameters. 

In addition to the basic problem of finding the principles govern- 
ing the phenomena, we also must, in analyzing the results of 
observations, deal with a more modest problem. Observations 
cannot give the values of a function for arbitrary values of the 
argument. Therefore, if we wish to have the values of a function 
for values of the argument that are not given in the table, we must 
resort to a mathematical apparatus for calculating approximate 
values of the function at values of the argument not included in 
the table. This is the problem of interpolation (or sometimes, of 
extrapolation), which we encountered in Part II, One of the methods 
of solving interpolational problems was shown then. 

In such cases, what we sometimes do is simply connect the 
points of the graph with a smooth curve giving the values of the 
function for arbitrary values of the argument. If only a very crude 
approximation of the values is required, this method is quite 
satisfactory since it does not require any calculations. Its obvious 
disadvantage is its excessive dependence on the observer because 
he draws the ‘‘smooth’’ curve by eye and to some extent sub- 
jectively, inasmuch as different observers have different concepts 
of smoothness and their eyes are not just alike. Therefore, it is 
difficult to compare the results of different investigators or to 
consolidate them, which often has to be done in astronomy. 

Any function approximating a tabulated function obtained from 
observations is called an empirical function or formula. 

The problem of constructing an empirical function requires the 
introduction of two conditions without which this problem, like 
the general problem of approximating a function, is indeterminate. 

1. An analytical expression for the function must be chosen 
that will approximate the tabulated empirical function. 

2. The empirical formula must be as consistent as possible 
with the observed results. 


89. CHOICE OF THE TYPE OF FORMULA 


Choice of the type of formula is the most indefinite and difficult 
part of the work. Sometimes, it needs to be done more than once. 
First a graph of the empirical table is constructed. In many cases, 
a comparison of this point graph with various curves with whose 
equations we are familiar gives an indication as to the possible 
type of formula. For example, if the points of the graph are 
distributed as shown in Figure 9, that is, if they occur at only 
slight distances from a straight line, we should look for a formula 
of the form 


Empirical Formulas 275 
x=a-+oOdt, 


where a and b are coefficients to be determined from the obser- 
vational data. 


t 


Fig, 9, An example of an empirical distribution 
close to direct proportionality, 


Sometimes very general properties of the relationship between 
the variables are known. For example, one variable may be 
approximately inversely proportional to the other, or one variable 
may be a periodic function of the other, with the approximate value 
of the period known, etc. Figure 10 illustrates a distribution of 
points more or less along one branch of a hyperbola with asymp- 
totes parallel to the coordinate axis. In such a case, we may try 
the general equation for an equilateral hyperbola, 


ai 
~ ¢ 4 det? 


a 


Fig. 10, An example of anempirical distribution 
close to inverse proportionality, 


276 Mathematical Analysis of Observations 


that is, we may seek a rational function of ¢. (One of these coef~ 
ficients can be set equal to unity.) We can choose such a function 
when we know that the variables are approximately inversely 
proportional to each other. 

In the case of periodicity, we may seek a formula of the form 


. (276 
xX =a sin (St +e) 


or a trigonometric polynomial of order higher than first. 

Finally, there may be cases in which we can construct a 
crude theory concerning the phenomenon. Then, the function 
determined by this :theory can indicate the possible form of an 
empirical formula. 


Example 1, A law can be constructed relating the mass and radiation of stars, From 
general considerations, we may roughly assume that the radiation of stars of a particular 
spectral class is proportional to the surface area of the star, Assuming the star to be 
spherical, we see that the radiation isproportional to the square of the radius, According 
to the same simplifying assumption, the mass is proportional to the cube of the radius 
(when the mass is assumed proportional to the volume), We can then eliminate the 
radius and obtain the following formula; 


col te 


L=am”", 


where Lf is the radiation, m is the mass, and ais a proportionality constant, In this 
form, the formula is, of course, unsatisfactory because the simplifications made were 
too great, Therefore, we should take a formula of the form 


L=am® +e, 


where a, 6, c are constants to be determined from observational data, 

Example 2, Construct an empirical law for the solubility of a particular substance 
{in a particular liquid, We need a formula expressing the amount of substance dissolved 
as a function of time, As a basis, we may take a simple differential law. The amount of 
the substance that will be dissolved in a small interval of time is proportional to that 
interval and to the amount of the substance that has not been dissolved, We denote by 
M the original amount of the substance and by + the amount that has dissolved at the 
instant 4, Then, according to the law that we stated, 


dx = k(M— x) dt, 


where £ is a positive proportionality constant, The variablesinthis differential equation 
are separable and its solution is 


x= M—be~*, 

where 6 is an arbitrary constant, Since x =0 when f=0U, wehave 6 =.\f, andhence, 
x= M— Me-*, 

Starting with this approximate relationship, we may seek a formula of the form 


x=M —- ae, 


where a and 6 are parameters, 


Whatever the form of the empirical formula, it will have one 
feature that is common to all the formulas, namely, the presence 
of literal parameters that must be determined according to some 
criterion of closeness of the formula to the table. The presence 
of several parameters makes the formula more flexible. 


Empirical Formulas 277 


Usually, we try to satisfy one further condition. It is desirable 
to have a formula that is linear with respect to the parameters or 
that can be reduced to linear form by simple substitutions. For 
this reason, we often use algebraic or trigonometric polynomials, 


90. THE USE OF LEGENDRE’S PRINCIPLE FOR 
DETERMINING THE VALUES OF THE PARAMETERS 


The method of calculating the parameters in the empirical for- 
mulas of course depends on the criterion agreed upon for the 
Closeness of the values given by the formula to those of the table. 
In the problem of exact interpolation (see Part II), we chose an 
algebraic (or a trigonometric) polynomial that would coincide 
exactly with the tabulated values of the function at the basic points. 
As we recall, this convention is meaningful if the table is exact, 
that is, if we can rely on all the digits of the numbers given in the 
table. In our present problem, where both the values of the func- 
tion and the values of the argument are obtained from observations, 
we cannot consider them exact in all formally obtained digits. 
For example, if measurements of an angle are made with an 
accuracy of 0.1’, the random errors and the systematic errors 
that are not taken into account may together amount to 1~2° or 
even more (for example, when observations are made of a comet). 
Therefore, there is no point in requiring that an empirical formula 
exactly represent the tabular function.* 

Let us illustrate this graphically (see Fig. 11). 

It is assumed that we know the limiting errors of the obser- 
vations or their mean square errors. Let us suppose that we know 
the limiting errors of ¢, andx,. We denote these by *, and £,. 
We denote by P, the point on the graph with coordinates (¢,, <,). 


Fig, 11, A ‘'rectangle of possible deviations’’ 
of a point in an empirical distribution, 


*This would be difficult even from a purely technical point of view because we would 
need to make more observations and we would then have to use polynomials of higher 
orders, (In the method of point interpolation, the degree of the polynomial is 1 less than 
the number of basic points,) 


278 Mathematical Analysis of Observations 


Observations yield a sequence of points P(¢,, x,). 1f the errors in 
¢ and x can be assumed independent, the abscissa ¢ could yield 
an arbitrary value from ¢,—vt, to ¢,+-7,, depending on how the 
factors that produce errors are combined. Analogously, instead 
of x,, we could obtain an arbitrary number from x;—>¢, to x,+-:,. 
Thus, because of the errors, the point P, can fall at an arbitrary 
point of a rectangle whose sides are 2:, and 2¢, and whose center 
is at the point (/,, x,). Such rectangles should be constructed 
around every point obtained from observations. It would be essen- 
tially sufficient for the graph of the empirical formula to pass 
through all these rectangles. 

If, instead of the limiting errors, the quadratic means ;,,, and 
c,,, are given, they can be reduced to the limiting errors with a 
probability close to unity (25 with probability 0.95 or 3s with 
probability 0.9973). 

However, the problem of constructing empirical formulas was 
not discussed in such a formulation because it would have been too 
complicated. Therefore, another method has been chosen in prac- 
tice, namely, the use of Legendre’s principle, with which we are 
already familiar: 

The parameters of the chosen empirical formula are deter- 
mined in such away that the sum of the squares of the deviations 
from the tabulated values of the function are minimized, 

This condition of approximating an empirical tabulated func- 
tion is simply a convention to which we cannot give a probabilistic 
explanation. We can only note that the choice of this convention 
is motivated by the following considerations. In most problems, 
the number of values of an empirical function is not very small. 
Therefore, it is not possible to make the deviations of the individual 
points small. We canonly introduce some general condition applying 
to the entire set of points. There is no sense in stipulating that 
the deviations, with signs considered, be minimized, since or- 
dinarily there are both positive and negative deviations. Such 
deviations, even if they are large in absolute value, can cancel 
each other out. The condition that the sum of the absolute values 
of the deviations be small is a quitenatural one. However, it would 
lead to more tedious calculations. Finally, we might require that 
the sum of the fourth~- or sixth- or even higher-order powers be 
minimized, but this would be a formal exercise in generalizations. 
Therefore, at the present time, it is customary to determine the 
parameters by the method of least squares, which leads to rather 
simple computations. 

Determination of the parameters of an empirical formula on the 
basis of Legendre’s principle is done in the following manner. 
Successive substitution of all the tabulated values of the argument 
and the function into the chosen formula with literal parameters 
leads to a system of conditional equations for determining the 
numerical values of the parameters from observations. In the 
general case, these equations are incompatible, which is ex- 
plained not only by the random errors in the measurements, 
but also by the facts that the chosen function is only an approximation 


Empirical Formulas 279 


of the unknown exact formula and that we are neglecting the depend- 
ence of the function on other arguments. The normal equations are 
set up from the conditional equations and are solved by one of the 
methods explained above. 

Since it is convenient to have linear conditional equations for 
carrying out the calculations, the formulas should be chosen in 
such a way that they are linear with respect to the parameters or 
can easily be reduced to linear equations by suitable substitutions. 

Here, the calculation of the mean square errors of the unknowns 
loses its probabilistic meaning. It would be more correct to call 
them the ‘‘mean square deviations.’’ However, these quantities 
still need to be calculated, since they give a representation of the 
reliability of the values calculated for the parameters. In particular, 
they show how many digits can be kept in the figures for the 
parameters. 


91. CHECKING OF EMPIRICAL FORMULAS 


Since, in contrast with point interpolation, an empirical formula 
does not represent exactly the tabular values of the function, the 
natural method of making a first check consists in calculating 
the values of the function at the tabulated values of the argument 
by the derived formula and comparing the results with the observed 
values. In other words, we calculate the residual errors of the 
conditional equations. A similar table can serve for checking the 
suitability of the empirical formula. 

If none of the remainders exceed the limiting errors of the 
measurements of the function in absolute value, we may consider 
the deduced empirical formula as satisfactory. If all the errors 
appreciably exceed the limiting errors in absolute value, the 
formula is obviously unsatisfactory. In most cases, what we have 
is an intermediary case—both large and small remainders. Then, 
we can use the signs of the remainders. If the signs alternate, 
the formula can be accepted. However, if the positive and negative 
remainders come in groups, the validity of the formula should be 
questioned, since there are systematic deviations. 

Finally, to evaluate the suitability of a formula, we may use 
the mean square error of a simple equation. If this quantity is of 
the order of approximately one half the limiting error of the 
measurements of the function, the formula may be accepted. If 
it exceeds the limiting error, the formula is unsatisfactory. 

If, upon investigation, the formula is found to be completely 
unusable or if it raises doubts, the investigator is advised to 
repeat the work with some modification of the formula. After 
several formulas have been set up, the one whose mean square 
error per unit weight is least should be considered the best. 
It is convenient to begin the choice of formulas by constructing 
a graph of the empirical formula and comparing it with the graph 
of the tabulated values. 


280 Mathematical Analysis of Observations 


In evaluating the suitability of a formula, we can sometimes 
use physical information regarding the fixedfunction. For example, 
if an empirical formula gives 4 maximum at some value of the 
argument and if we know that the fixed function is monotonic in 
a neighborhood of this value of the argument, the-empirical formula 
found cannot be considered satisfactory. 


92. AN ILLUSTRATION OF THE DERIVATION OF AN 
EMPIRICAL FORMULA 


Suppose that observations yield the following tabulated function: 


f 00 01 02 03 04 0.5 
x 13 10 08 06 0.4 O1 


Construction of a small-scale graph gives approximately a straight line, Therefore, 
let us construct a linear formula of the form 


x=a-t Od, 


where a and § are unknown coefficients, 

Let us set up the conditional equations according to the scheme in which the control 
numbers s. are included, that is, the sums of the coefficients and the free terms are 
transferred to the left, This scheme also includes the columns of residual errors : 
and their squares, (Of course, these two columns are filled in after solution of the 

6 
system of normal equations.) The last column can not be filledinif s= Sy is 
k=l 
calculated on a calculating machine by the method of accumulation, 


: J 
- | S§ | E { E~ 104 
1 | 00 | —13! —03 1 — 003! 9 
l Ol | —10) +01) +904 16 
1 | 02 | —os | +04) +4001; 1 
1 | 03 |—06 | +07 | —002) 4 
1 | o4 | —04] +10] —ooa! 25 
1 0.5 ;—O1) +14 + 0.03 | y 
ae ¥ L | 
6 | 1.3 | —42] +433 | -0.02| 0.0054 
The normal equations are of the form 
_ ee 
a | b | L | S 
| | __ 
Go | 15 | -42!1 as 
15 | 055 /—065° 14 


75 | 2.05 | 4.85 47 


We can check the normal equations by using the number S, which is obtained in the 
first equation by adding all the numbers s (since the coefficients of a are all equal to 
unity), In the second equation, S is the sum of the products of the numbers s and the 
coefficients 6 in the conditional equations, Furthermore, a supplementary control may 
be obtained by adding the coefficients and the free terms in the normal equations, and 


comparing this sum with the sum of all the numbers S, This check appears in the third 
row. 


Empirical Formulas 281 


Let us solve the system of normal equations by means of determinants, In order to 
avoid an accumulation of errors resulting from the approximate calculations, we perform 
the calculations as if the coefficients of the normal equations were the exact numbers, 
In the present problem, this presents no difficulties because all the coefficients have only 
a small number of digits, We evaluate the determinants: 


p=\' Ne =| 1.5 n | —33 1.5 
15 0.55 "+ 0.65 0.55 |’ =| —14 0.55 |’ 
Dd | 6 4.2 D.= 6 — 33 
o=l15 06s | a) 15 mn 


The determinants ), and 1, are obtained by replacing the column Z with the column 
S (in order to check the solution to the normal equations obtained by solving the system 
of normal equations) The unknowns z and 8 must be one less than a and 4, respec- 
tively, Evaluation of the determinants ylelds 


D=1.05, Dyg=1333, Dy=—24, D,=0.285, D,=—3.45. 


We now calculate the original and the control unknowns: 


1.333 |, a 
a= “T0357 = 1.271, $= — 1.05. = — 2,286, 
J.285 ,, 3.45 
= - EO —- CO 2 =—_ Core eee ee ™——- » . 
r= yg S OAT B 1.05 3.286 


We calculate the mean square error of a single conditional equation by substituting 
into the conditional equations the values found for the coefficients; we then calculate 
the remainders, denoted in the diagram of the conditional equations by the letter «; 


> 0.0064 


B= Gy = 0.0016, 39 = 0.04. 


We determine the weights of the unknowns by means of the determinants: 


1.05 1.05 
Pa >= 0.55 = 1.91, Py = 078 = 0.18, 


We determine the mean square errors (or, more precisely, the mean square devia- 
tions) of the unknowns: 


32 = sone = 0.00084, a = oor = (1.0089, 
dy = 0.029, 3, = 0.094. 


The result obtained when we determine the coefficients can be written as 
a = 1,271 + 0,029, 6 = — 2,286 + 0.094. 


When we calculate the values of the mean square errors, we write the result in the 
form 


a=127 £0.03, 6=— 2.29 + 0.09. 
The empirical formula is then 
x = 127 — 2,29. 
Examination of the residual errors shows that the formula represents the observations 
quite satisfactorily, provided we assume that the values of x are given with a limiting 


error of 0,05, The absolute value of only one of the remainders is equal to 0,05 and the 
absolute values of the other remainders are less than this, 


Part V 
ANALYSIS OF STATISTICAL MATERIAL 


Chapter 18 


ANALYSIS OF A ONE-DIMENSIONAL 
STATISTICAL SET 


93. STATISTICAL SETS 


One of the problems encountered in the natural sciences consists 
in studying objects or phenomena that have certain common 
characteristics. (The individual objects in such a set may differ 
one from the other in certain other characteristics.) In astronomy 
and in other disciplines, such problems are encountered quite 
often. The stars in a catalogue, the catalogue of elements of the 
orbits of the minor planets, etc., are such sets. For example, 
in the catalogue of elements of the orbits of the minor planets, 
known asteroids are included in a single catalogue, because these 
heavenly bodies are small in mass and dimensions and their 
motion is determined in large measure by solar attraction. We 
may say that the objects of the catalogue are grouped together in 
a set according to Some qualitative criterion. 

But every object in the set has its own individual characteristics 
(in our example, there are characteristics which vary from one 
minor planet to another). If we single out those characteristics 
of the minor planets that determine their motion (that is, the 
elements of the osculating ellipses of some epoch), we obtain a 
statistical set in which each member has numerical values for 
the six orbital elements describing the motion. If we choose a 
single orbital element, for example, the declination, we obtain a 
one-dimensional statistical set. 

We shall limit ourselves to consideration of one- and two- 
dimensional sets, although we sometimes must deal with sets of 
greater dimension. In such cases, one should consult more com- 
plete texts, for example, the book by V. I. Romanovskit. 

The first problem in the study of statistical sets consists in 
finding an approximate expression for the distribution function or 
the probability density from empirical material and‘in develop- 
ing a method of obtaining a few well chosen numerical charac- 
teristics (in order to gain anoverall picture of the entire statistical 
set). 

This problem is closely related to another, namely, that of 
finding the probability that a randomly chosen object in the set has 
a value within given bounds. 


285 


286 Mathematical Analysis of Observations 


We could confine ourselves to these problems if the empirical 
set contained all objects of the type investigated. Such complete sets 
are called general sets. The simplest example of a general set is 
census material giving information about all countries. In the sets 
studied in astronomy (the set of minor planets, the set of variable 
stars of a definite type, etc.), the general set is almost always un- 
known because every year new members of each such set are dis- 
covered. We might consider as an exception the set of stars of 
visible magnitude attainable by ordinary instruments. All such 
stars are known, but the set consisting only of stars of visible 
magnitude is not of great interest to science. 

In most problems, we know a set which can be considered a 
sample of an existent general set onlya part of which has been dis- 
covered by observations. If some objects of the general set fall 
purely by chance into the observed material and other objects do 
not, we have a random sample of the general set. However, ina 
large number of problems, the reason that not all objects of the 
general set appear in the statistical material does not lie in random 
causes, but is due to the fact that certain of the objects cannot be 
discovered by present-day methods of observation. In other cases, 
it is impossible, for various reasons, to ascertain that criterion 
according to which the set is chosen. If either of the above two 
situations occur, we say that the set is the result of selection. 

Examples of the first type of selection (due to the limitations of 
observational methods) are obvious: with present-day instruments, 
it is impossible to discover stars with a visible magnitude of less 
than +24”, Very small asteroids can be seen only when they get 
close to the earth, and so on. Somewhat less obvious are the cases 
of selection that are the results of properties of the heavenly bodies. 
We mention two examples: (1) It is possible to see an eclipsed 
double star only when the observer is ina certain restricted region 
of space formed by the cone tangent to the surfaces of both stars. 
(2) Because of the limited resolving power of spectroscopic devices, 
we cannot detect the radial velocities of stars unless they exceed a 
definite value. If the first two problems are completely solved for 
the sample set, the third problem then arises, that of finding out to 
what extent the numerical parameters obtained for the sample set 
describe the general set. 

In this part of the book, we shall consider primarily the first 
two problems. We confine ourselves tothe basic problem mentioned 
above, namely, that of developing means of obtaining certain num- 
bers that might be considered as satisfactorily describing the 
given statistical set. Tosolve sucha problem, we shall consider the 
value of that numerical element (or of those numerical elements) 
being studied as random. If it is possible to find an approximate 
law of distribution of this random variable, its parameters may be 
considered sufficiently characteristic of the entire set, since it is 
possible to calculate from these parameters the probability that 
the value of the variable in question will lie between arbitrary 
given bounds. Knowledge of this probability makes it possible to 
determine (with the same probability as above) the number of 


Analysis of a One-Dimensional Statistical Set 287 


objects corresponding to each interval of values of the variable 
(by multiplying the total number of objects by the value of the 
probability). 

The method of finding the law of distribution from empirical 
material is the same as the method of constructing empirical 
formulas (see the preceding chapter). The difference consists in 
the fact that the number of known theoretical laws of distribution 
is finite. Quite frequently, we first check the suitability of the 
normal distribution law. 

The chosen theoretical law must be checked against the observa-~ 
tional material. 


94. A DISCRETE EMPIRICAL DISTRIBUTION AND ITS 
NUMERICAL CHARACTERISTICS 


On rare occasions, it happens (in astronomical studies) that one is 
dealing with a discrete empirical distribution (for example, the set 
of multiple stars). Another reason for considering discrete distri- 
butions is that the procedure for analyzing a continuous empirical 
distribution is almost the same as for a discrete one. 

The material obtained from observations constitutes a catalogue 
in which the value of a discrete random variable X is shown for 
every object of the set (for every observation). Suppose that the 
variable X can assume the values 


written in order of magnitude. The numbers of observations that 
give each of the possible values x,, x,,...,-+; of the variable ¥ 
are called the multiplicities and are denoted by %. ™,..., iis. 
The ratio of the multiplicity 2, to the overall volume (number of 
elements) » of the set is called the empirical probability that the 
variable X will take the value ~;. 

The initial processing of the material containing the n objects 
is begun by calculating the multiplicities and is put in the form of 
the following table. 


Xj, Xy seer Xy-1, Xy (Xg41 > Xe) 
8 
My, May weer Mya, Meg (0) >» ne =", 
kam 
8 ny 
Pis Pa rey Py-ty Ps (0) P Pel, Pe=7, 
=1 
k-1 
O=N, No or N u-1 N, Nyy, =n Ny = > 
Tmt 
N 
O== Py Pa wee Psa Ps Py+t =! Px — 
In this table, the numbers n,, 2,,...,., represent the multi- 


plicities of the respective values of the variable, The first two rows 


288 Mathematical Analysis of Observations 


are called the empirical distribution table of the random variable. 
Instead of the numbers “;,, we introduce the empirical probabilities 
Py== n, /n appearing in the third row. However, in performing the 
calculations, it is more convenient to deal with whole numbers and 
‘to leave the division by 7» (in orderto convert to probabilities) until 
the end of the calculations. Therefore, the numbers p, are rarely 
included in the table. The fourth row gives the numbers N, of 
objects for which the values of the random variable are less than 
x;,. Obviously, we have 


N, = 0, N =n, 
Ny= n+ ny, ..., Ng=ntng +... +n, =n — ng. 


In the first row, let us add one more value «x,,, arbitrarily close to 
x, but somewhat greater than it. Since observations do not yield 
such a value, n,.,= 0. Then, by the definition of the number N,, it 
follows that N,,,—= 17. If we divide all the numbers N,(fork= 1, 2, 

., Ss, S-+1) by the total number of observations, we obtain the 
numbers /, appearing in the fifth row. They may be considered 
empirical values of the distribution function of thediscrete random 
variable xX. In the case of a discrete variable, the distribution 
function has a discontinuity at each of the values of x,. 

The basic problem of determining the probability that the ran- 
dom variable will take a value between the given limits 2 and 3 is 
easily solved with the help of the table that we have constructed. 
If we rule out the possibility of X assuming one of these bounds, 
we will always have 


7<b 
Pa<cx<~)= 2 Pr 


The summation is taken over all r satisfying the inequality follow- 
ing the probability symbol. 
The method of finding the probability 


PacA<), 


depends on the coincidence or non-coincidence of one or both of 
the bounds with the tabular values. For example, if 3—-+,, where 
r<s, and a is not equal to any of the numbers +,, we have 


k=xy 
PAcN<B)= » py. 


r>a 


The summation begins with the term corresponding to the smallest 
value of Y that exceeds z. 

If a table such as we have been considering can be constructed 
and is not too large, it will give complete information regarding a 
discrete random variable. Such a table can be used for describing 


Analysis of a One-Dimensional Statistical Set 289 


the set and for solving the basic problem when it is a question of 
a Single set of a definite type. If we need to compare different 
sets of a single type (for example, sets of multiple stars in various 
regions of the galaxy), comparison of the complete tables is by no 
means always easy to perform. It is convenient to have a small 
number of definite and distinctive numerical characteristics of the 
set (in order to compare these characteristics instead of compar- 
ing the complete tables). Letus consider afew such characteristics. 


1. The Average Value of a Set. 


The simplest overall numerical characteristic of a set is probably 
the average value of the random variable defined as the expecta- 
tion of the discrete variable by the formula 


x= Dd «rp (18.1) 


However, in contrast with the definition of expectation, the »; are 
empirical probabilities. 

Like the expectations, the average value is a number with di- 
mensions. Therefore, in comparing sets of a single type, it is 
necessary that be measured in the same units. If we measure 
sets of different types, we cannot compare average values because 
these are different kinds of quantities. If x is not equal to any of 
the possible values of the variable, we may consider x a formal 
characteristic and take as an approximation the closest possible 
value. 


2. The Median. 


Like the median of the theoretical ‘distribution of a continuous 
random variable, the median x,, of an empirical distribution of a 
discrete variable is defined by the following condition: the empiri- 
cal probability of obtaining a value less than the median must be 
equal to 1/2. (We sometimes say that the median ‘‘divides the set 
in two.’’) The median of a discrete variable is most conveniently 
determined from the table of values ‘’,, As was shown above, the 
numbers in this table increase monotonically (though in the case of 
individual pairs of points, they may be repeated). If among the 
numbers NV, there is an N,, exactly equal to one half the total num~ 
ber of observations, we may take any number between +,,_, and.x,, 
as the median x, because, by definition, 


m—i 
l 
Nm = Syme =nP (eS kms PES End = Z- (18.2) 


k=1 


290 Mathematical Analysis of Observations 


If there is no number JN, exactly equal to n/ 2, we find a number 
N,<n/2 such that N,,,> 2/2. The corresponding values of the 
variable X will be x, and «,,, In this case, we could say that the 
median x,, lies between the numbers ~, and x,,,. For x,,we may 
take either x, (if n/2-—-N,<N,,,— n/2) or *,., (if the direction 
of the inequality is reversed). Since we are considering a discrete 
variable having no values between x, and «,,,, the answer can be 
given only approximately in the form indicated, thatis, either x, or 
Xpaie 

We can also determine the median x, by linear interpolation 
from the values JN,, N,,,, x,, and x,,, and treat it as a formal 
characteristic of the set. 


3. The Mean Square Deviation. 


By definition, the mean square deviation is the square root of the 
variance. If, in the definition of the variance of a discrete random 
variable, we replace the probabilities with empirical probabilities, 
we obtain the formula for determining the empirical mean square 
deviation: 


oD Da (Xp XP. (18.3) 


In many problems, the numbers +, are rather simple—in the 
sense that they do not have many digits. For example, if we con- 
sider the set of multiple stars,the numbers x, have the values 1, 2, 
3, 4,... (the end of the table cannot as yet be considered definite). 
In the general case, however, the number «x is not simple in this 
sense; that is, it has several significant figures and calculation 
from the formula given can be complicated. Therefore, it is better 
to use the formula for calculating the variance (see Section 44 of 
Part III): 


o2= 2% DyX% — X°. (18.4) 


The number oc, has the same dimension as do the numbers «,. 
Therefore, the value 5, depends on the choice of units in which the 
measurements are made. In comparing sets of the same kind, that 
is, sets of the same quantities selected according to different 
criteria, it is necessary that the values x of the elements of the 
different sets be measured in the same units. 

If we must compare different sets, it is convenient to introduce 
dimensionless numerical characteristics. To obtain a dimension- 
less characteristic of variance, we may take the quantity 


Vo=2 
@ x , (18.5) 


known as the coefficient of variability. It is not a convenient 
characteristic if x is close to 0. 


Analysis of a One-Dimensional Statistical Set 291 


These two numerical characteristics (+ and «,) are sufficient 
for us to evaluate the probability that X will assume a value 
between the given bounds, using Chebyshev’s inequality: 


P(|X—x|<te,)> — 


However, the bounds cannot be chosen completely arbitrarily: they 
must be symmetric about the mean value x. 

The estimate given by Chebyshev’s inequality is rather crude. 
Use of the complete table, if this is possible, will give the exact 
probability. 


4. Moments of Distribution. 


The moments of distribution have been defined in Part III for con- 
tinuous random variables as the expectations of the powers ofa 
random variable (the initial moments) and the expectations of the 
powers of the deviation of a random variable from the center of 
distribution (the central moments). 

Let us use this same definition for a discrete random variable, 
replacing the theoretical probabilities with empirical ones in the 
expectations. Then, for the initial moments of order r, we obtain 
the formula 


y= DX (18.6) 


where the , are positive integers. 
Since the initial moments depend on the chosen zero value, the 
central moments, defined by the formulas 


eB Pa (ee — 9" (18.7) 


are more indicative of the distribution. They do not depend on the 
zero value since, when a displacement of the zero value is made, 
x is displaced the same amount. 

For the most part, it is convenient, in making calculations, to 
find the initial moments first and then compute the central mo- 
ments from formulas that are easily obtained from the binomial 
expansion and the reduction of the last two terms: 


— r(r—l) yey 
hy ST yp Ky op re -et. 


r rey 


. +(—1) xr—2y, +. (— 1yrtl (rp — 1) xr, 


For example, 


_ _ _ 18.8 
ey == Vy — 4xv, + 6x2, — 3x4 \ 


292 Mathematical Analysis of Observations 


For dimensionless numerical characteristics, we may take the 
numbers 


M,=% 


go? 


or 
mn, = Vee (r= 1, 2,...). (18.9) 


By using these dimensionless characteristics, we can compare 
not only the different distributions of a single variable, but also the 
distributions of different variables. 


95. CONTINUOUS EMPIRICAL DISTRIBUTIONS 


In almost all problems of statistical analysis of material, we must 
deal with continuous variables. 

The observational material is found in different catalogues or in 
specially performed observations. The ‘‘raw’’ material is a list of 
the elements of a set with an indication of the numerical values of 
the variable that are obtained fromobservations. Ifthe set contains 
few objects (for example, ten or twenty), the observational material 
is analyzed directly. If there is a great amount of this material, 
direct analysis of it would be tedious and the labor involved would 
not be justified by better results, since the material is almost 
always of a selective nature. 

The raw material is subjected to a preliminary analysis 
consisting of the following operations: 

By consulting a catalogue, we find the smallest and largest 
values of the variable. The first we either decrease or leave un- 
changed; the second we either increase or leave unchanged. Ifa 
change is made, it is so that the bounds of the region taken for the 
value of X will have fewer significant figures. Some broadening of 
the tabulated region of values of X is ordinarily admissible, since 
the material is almost always of a selective nature and the actual 
region of values of X may be greater than the region obtained from 
observations. We divide the region that we now have into s equal 
parts (intervals), being careful (for purposes of simplifying the 
calculations) to see that the end points ofthe intervals are numbers 
with few significant figures. The question of the number of intervals 
does not yet have a theoretical basis and is solved by trial and 
error, It is usually convenient to take somewhere between ten and 
twenty intervals, but these figures are only guides. 

Having decided on some number of intervals, we denote the end 
points of the intervals by x, «x, x@,...,x«. We now count the 
number of elements of the set for whichthe values of X fall in each 
of the intervals. It may happen that the individual values of X that 


Analysis of a One-Dimensional Statistical Set 293 


are obtained from observations are equal to the values of the end 
points of some of the intervals. In such cases, an agreement must 
be made as to which of the intervals such elements of the set 
should be assigned to, that is, to the interval on the left or the one 
on the right. What is frequently done isto assign to each of adjacent 
intervals half of such cases. In order not to deal with fractions, in 
such a case we multiply the total number of elements by two. This 
introduces no error because the total number of observations is 
multiplied by two; consequently, the empirical probability of 
falling in each of the intervals remains unchanged. The counting 
must be made for each of the intervals separately. Then, as a 
check, the sum is taken for all the intervals. It must be equal to 
the total number of observations. 

If there are many observations (more than a hundred), it will 
become quite tiresome to make acount for each interval separately, 
because it will be necessary to look through the catalogue (or list) 
as many times as there are intervals chosen. The counting can be 
simplified in the following simple manner. A rectangle of arbitrary 
dimensions drawn on graph paper is subdivided into identical sub- 
rectangles by parallel lines. The number of these rectangles is 
equal to the number of intervals chosen. The values x) corre- 
sponding to the end points of the intervals are written on the 
dividing lines bounding the rectangles. We read the catalogue 
(once) and place a point in the small rectangles corresponding to 
the value x of each element of the set. When the entire catalogue 
has been read, there will be a number of points in each of the 
small rectangles. These still need to be counted. If any of the 
catalogued values is equal to an end point of one of the intervals, 
it is advisable to put a cross rather than a point on the dividing 
line. 

After the number of elements in each intervalhas been counted, 
we obtain a table like that shown below. The first column shows the 
end points of the intervals; the second shows the values of X 
corresponding to the midpoints of the intervals; the fourth shows 
the number of observations in each interval. 


End points of} Midpoints of | Conventional} Number of 
the intervals] the intervals units observations 
xO) 


x4 


x3 


xr 


Xs 


After the values n,,n,,..., 1, have been taken from the table, 
we may consider them the multiplicities of the values of a discrete 


294 Mathematical Analysis of Observations 


variable taking the values x,, x,,....+«, With empirical proba- 
bilities 


| 
af 


Calculation of the basic characteristics of the distribution (the 
average value, the mean square deviation, the moments) can be 
carried out in the same way as for a discrete variable, from the 
formulas of Section 94. 

To make the calculations easier, it is helpful to calculate, 
from the table of multiplicities, in what interval the average value 
must fall. Usually it will be the interval in which 1, is greatest. 
If we consider the «,, as mass points placed in the interiors of the 
intervals, the average value x will correspond to the center of 
mass. We can use this criterion to determine what interval may 
contain the average value, if the distribution is very asymmetric.* 

Suppose that x, is the midpoint of the interval in which the 
average value lies. Let us assign the number zero to that interval. 
We number the intervals following the zero interval in order, 
beginning with 1 and going up to s—-r . We number the intervals 
preceding the zero interval with the negative numbers from —1 
to —(r—1). These conventional numbers or units are shown in the 
third column of the above table. 

Instead of x, we now introduce a new argument ¢ of the set, 
whose values are equal to these conventional units. The variables 
x and ¢ are related by the obvious linear relationship x,,,—=+«,-+ At, 
where A is the step in the table of values of x. 

Introduction of the argument ¢ instead of « considerably simpli-~ 
fies the calculations because all the parameters are calculated 
first for the argument t. The average value and the mean square 
deviations are then expressed in units of the same dimension ag 
those in which the variable being studied is measured. This is 
done by means of the formulas 


x= x,+ht  6,= ho, (18.10) 


If the distribution is not a normal one, the parameters x and 
<, (or ¢ and z,) are not sufficient for describing the set or cal- 
culating the probabilities. In such cases, we need to calculate also 
the asymmetry and excess by the formulas 


—~ Se = (18.11) 


where »,(¢) and p,(f) are the third- and fourth-order central mo- 
ments for the argument ¢. These two parameters are necessary if 


*We note, however, that calculation of the position of the average value has no theo- 
retical value, If we obtain the wrong interval, this only means that the calculations will 
be made somewhat more complicated, 


Analysis of a One-Dimensional Statistical Set 295 


we wish to approximate the empirical distribution that we are in- 
vestigating by a Charlier function or, as we say, to compare the 
empirical distribution with the corresponding Charlier curve. 

Let us now clarify the meaning ofthe asymmetry and the excess. 

If the distribution is symmetric about the center of distribution, 
the central moments of odd order vanish and the asymmetry is 
equal to 0. The less symmetric the distribution, the greater will 
A differ from 0. Consequently, .1 increases with increase in the 
asymmetry of the distribution. 

If the relative maximum of the distribution lies to the right of 
the center of distribution, », will be positive, since the distribution 
will then contain more cases with positive deviations from the 
center than negative ones. The quantity « is positive by definition. 
Therefore, we obtain from the formula a positive asymmetry. On 
the other hand, if the maximum is located to the left of the center 
of distribution, the asymmetry will be negative. Thus, the sign of 
the asymmetry indicates the direction of the asymmetry of the dis- 
tribution curve. 

The significance of the excess F is clarified if we remember 
that, for a normal distribution, we have y»,— 30. Therefore, ac- 
cording to (18.11), the excess of anormal distribution is equal to 0. 

For simplicity, we assume that the empirical distribution is 
symmetric. We also assume that the normal distribution corre- 
sponding to the empirical one (that is, having the same center and 
variance as the empirical one) has been constructed. 

If the vertex of the graph of the empirical distribution is higher 
than the vertex of the normal one, F will be positive. If the vertex 
of the empirical distribution is lower than the vertex of the normal 
one, F will be negative. Thus, a positive excess indicates that the 
empirical distribution curve is higher at the center than the normal 
one (and therefore must be lower than the normal one at some 
distance from the center, since the area bounded by each curve is 
equal to unity). If the excess is negative, then the curve will be 
lower at the center than the normal one (and at some distance 
from the center, higher than the normal one). 

If the distribution is asymmetric, the median and the mode are 
also calculated. 

As we have noted before, the median is a value of the variable 
that exceeds one half the values in the empirical distribution. It 
can be determined with the aid of the numbers «x, and «,., (which 
are found in the same manner as for a discrete distribution). In 
this case, however, we cannot confine ourselves to determining the 
discrete value closest to the median, since the variable is con- 
tinuous. Therefore, we proceed as follows. After finding the two 
values x, and «,,, of the midpoints of the intervals between which 
the median lies, we evaluate the median by linear interpolation 
from the numbers «x, and «x,,, and the corresponding numbers N, 
and N,.,, by using the obvious formula 


Xp44— xX, (in 
Fig =, + FASE (FE Ny). (18.12) 


296 Mathematical Analysis of Observations 


The mode of a theoretical distribution is the value of the vari- 
able at which the probability density has a relative maximum. In 
an empirical distribution, we may assume that the mode is equal to 
the value of the midpoint of the interval for which the multiplicity 
is greatest. If greater precision is desired, we choose a theoreti- 
cal density that satisfies the approximate empirical distribution 
sufficiently well and we take for the mode the value of the variable 
corresponding to the maximum of the assumed distribution law. 

Sometimes, as a result of mixing of data of various kinds, the 
table on page 293 will have more than one relative maximum. If 
the relative maxima are stable, that is, if they do not disappear 
when the intervals are varied, the problem of dividing the data and 
determining the parameters of the constituent distributions may 
arise. Such a problem can be solved for two constituent normal 
distributions. 

The following example illustrates the procedure for calculating 
the moments up to the fourth order inclusive and the column of 
values of N, of an empirical distribution function (see Table A; 
Table B will be examined in the following section). 


Table A 


Example. Analysis of the distribution of the absolute magnitudes 
of stars of the spectral classes B5-B9. Calculation of the 


moments. 


+ 2.0 189 
from [vee [ [=e [on j-as [aa | 


Analysis of a One-Dimensional Statistical Set 297 


Explanation of Table A 


Column 1 shows the end points of the intervals of the absolute magnitude M. The 
values of M corresponding to the midpoints of the intervals, which we denote by M,, 
might have been put in this column, but only one of them is used in practice and it is 
easily determined by inspection, Therefore, there is no need of writing it, 

Column II contains the multiplicities in all the intervals indicated in column I, The 0 
in the parentheses indicates that the multiplicities n, are taken from observation, 

Column III gives the absolute magnitudes in conventional units (as measured from the 
zero point corresponding to the mean value), These numbers are denoted by ¢;,, The 
zero point is taken at the midpoint of that interval corresponding to the largest multi- 
plicity (in the present case 78), 

The numbers in column IV are obtained by multiplying the numbers in columns II and 
III, The sum of the numbers in column IV gives the first-order initial moment after di- 
vision by 7, that is, the average value in units of ¢. 

The numbers in column V are obtained by multiplying the numbers in columns IV and 
III, The sum of the numbers divided by 7 gives the second-order initial moment, 

Analogously, columns VI and VII are obtained by multiplying the preceding column by 
the numbers ¢;, When we add these numbers and divide the sum by n, we obtain the 
third- and fourth-order initial moment, 

Column VIII gives the (net) empirical distribution of the variable in question, The 
numbers N, apply to‘ the end points of the intervals, The statistical meaning of these 
numbers is the following: not one of the values is less than - 2,0; one value is less than 
- 1,5; in four cases, the value is less than - 1,0, and so on, 


Table B 


Comparison of the empirical and the normal distributions 
ee ee 
pfe) 


0.0 


- 0. 4999 


0.0016 


- 0.4983 


0.0222 


- 0.4761 


0.1322 


- 0, 3439 


0. 3248 


0. 3395 


+ 0, 3204 
0.1496 


+ 0, 4700 
0.0278 


+ 00,4978 


0.0021 


Calculations Outside the Diagram 


(a) The initial moments in conventional units; 


t =v, = —85: 189 = —0.450 vi = +0.202 
Vg = —283: 189 = —1.497 
° vt = +0.0410 


vy == +1141: 189 = +6.037 


298 Mathematical Analysis of Observations 


(b) The central moments in conventional units: 


wo == +1.275 — 0.202 = 1,073, o, = V 1,073 = 1.036 
pq = —1.497 — 3- 1.275 (—0.450) +. 2 -(—0,0911) = +0 042 


o, = 1,112 
u, == 6037 -- 4-(—1.497) - (—0.450) + 6 - (1.275) - 0.202 —3-0410 = 4.765 
c= 1.151, 


The Parameters (numerical characteristics) of an Empirical Distribution 


The average value. Here, the average value must be calculated in the original unit, 
that is, in the same units as are given the catalogued data, The zero point agreed upon 
was taken at the midpoint of the interval between 0,0 and 0,5; that is, the value M= 0,25 
corresponds to the zero point, The step in actual units is equal to 0,5, Therefore, 


M = 0.25 + (—0.450) - 0.5 == +0.025. 


The median. In the tabie giving the numbers N,, (column VIII), we have the following: 
N;. == 86 when M= 0,0 and N; = 164 when M= 0,5, We must find by linear inter- 
polation the value of M at which N;, =. 94,5 (half the total number of observations), 
From formula (18,12), we obtain 


94.5 — 86 0.5 = 0.054, 


Mm =00 + ey —56 


The mean square deviation: 
Sar = 1.0386 - 0.5 = 0,513. 


The asymmetry and the excess (according to formulas (18, i1)): 


£0,042 4.765 


The mode. As was stated above, for the value of the mode of an empirical distribution 
we may take the value at the midpoint of that interval with the greatest muluplicity, This 
gives us a value of 0,25 (in the original units, that is, star magnitudes), 

The mode can be determined with greater reliability if we can find, in analytical form, 
a distribution law satisfactorily representing the empirical distribution, If we assume 
that our material is satisfactorily represented by a normal jaw of distribution with 
parameter M = 0,025, we may take the mode equal to 0,025, The computed parameters 
of an empirical distribution give a description of the set with the aid of certain numbers, 
From these, we may obtain certain general facts regarding the random variable in 
question, without making comparisons with the theoredcal distribution, 

The value of the mean square deviation shows that the spread (variance) of the values 
of M is not great, The interval from Af cy to Af + %n contains more than 70 per 
cent of all the observational material, A slight difference between the average value and 
the median shows that the distribution is approximately symmetric about the center of 
distribution, This is supported by the small value of the asymmetry, The sign of the 
asymmetry shows that the mode must be somewhat greater than the average value, The 
value of the excess cannot be assumed small, This means that near the center there is a 
preponderance of observations in comparison with what would be the case if the distribu- 
tlon were normal. This is in agreement with the remark about the percentage of the 
observations in an interval of length 2:,, about the center of distribution, 


96. COMPARISON OF THE EMPIRICAL AND THE 
THEORETICAL DISTRIBUTIONS 


A theoretical distribution function (or probability density) must con- 
tain literal parameters, in the general case. Comparison of the 


Analysis of a One-Dimensional Statistical Set 299 


distribution obtained from observations with some theoretical dis- 
tribution consists first in calculating the parameters of the theo- 
retical distribution from observational data and then constructing 
a theoretical distribution with the numerical values found for the 
parameters and comparing this distribution with the empirical one. 

The parameters of the theoretical distribution must be calculated 
so that it will represent observations in the best possible manner 
from some standpoint or other. Pearson suggested the method of 
moments for this. 

With this method, we need to calculate the second- and higher- 
order central moments and the first-order noncentral moments of 
the theoretical distribution and equate them to the corresponding 
moments of the empirical distribution. The moments of the 
theoretical distribution are functions ofthe literal parameters of the 
distribution. The moments of the observed distribution are numbers 
obtained by processing the statistical material. When we equate 
these numbers to the functions of the parameters referred to (that 
is, the moments of the theoretical distribution), we obtain a system 
of equations for determining the parameters. 

For example, if the expression for the probability density con- 
tains three parameters, the theoretical moments up to the third 
order inclusively are equated and we obtain three equations with 
three unknowns. We note that the moments of zeroth order must 
also be equal, but this condition will be automatically satisfied if 
the probability density is normalized (that is, if its integral is 
equal to unity). Equality of the first-order central moments indi- 
cates that the theoretical average and the empirical average must be 
equal to each other. Equality of the second-order central moments 
indicates equality of the variances or the mean square deviations. 

Quite frequently, the empirical distribution is compared with the 
normal distribution. The probability density of a normal law con- 
tains two parameters. Therefore, to compare the observed dis- 
tribution with the normal distribution, we only need to equate the 
first- and second-order moments. This gives us 


> = 4s 
Ey, pg = 9" = Pro 


where / and c denote the parameters of a normal law and », and p, 
are the computed moments of the empirical distribution (in the 
conventional units). 

After we obtain the parameters for a theoretical (e.g., normal) 
law, we must construct the theoretical distribution and then com- 
pare it with the empirical. Here, we may use various criteria 
regarding the closeness of the theoretical to the empirical distri- 
bution (agreement criteria). We give two such criteria here. 


1. Kolmogorov’s Criterion. 


The parameters of the theoretical distribution are used to calculate 
the probabilities that the random variable will take values less than 
the end points of the intervals of the argument of the distribution. 


300 Mathematical Analysis of Observations 


If we multiply these probabilities by the total number of observa- 
tions ”, we obtain numbers N;, whichare the values of the theoreti- 
cal distribution function. The superscript (c) indicates that these 
numbers pertain to the theoretical distribution. 

Let us find the maximum absolute value of the difference be- 
tween the numbers \ andthe values ofthe function of the empirical 
distribution N,. If we denote this quantity by D 


D = max|Ny'— N,!, (18.13) 
we obtain the argument \ of Kolmogorov’s criterion: 


hae, (18.14) 


From the argument \, we can use the table below to find the value 
of ? (03, which is the probability that the difference D will exceed the 
value obtained: 


[es [ee os [oe for fos [or [aa 
1.000 | 0.997 | 0.964 | 0.864) 0.711 | 0.544} 0.393 | 0.270 


The closer P().) is to unity, the better the chosen theoretical dis- 
tribution represents the empirical distribution. If P()is small, the 
theoretical distribution will not represent the observational mate- 
rial satisfactorily. 


2. The Precision Coefficient. 


Using the theoretical law, we calculate the probabilities that the 
random variable studied will assume values contained within the 
intervals of the given empirical distribution. If we multiply the 
probabilities obtained by the total number of observations, we ob~ 
tain the theoretical multiplicity c. Comparison of these numbers 
with the multiplicities 0 obtained from observations may show how 
close the observed distribution is to the theoretical distribution. 

Using the theoretical distribution, two or three of the boundary 
intervals are combined into one (see the example). The precision 
coefficient H is calculated from the formula 


H= — V at (18.15) 


where s’ denotes the final number of intervals of the empirical dis- 
tribution in contrast with the initial number of intervals. It may be 
shown that the expectation of ‘/ is equal to unity. Therefore, the 
theoretical distribution constructed is close to the empirical one if 
4 is close to unity. 


Analysis of a One-Dimensional Statistical Set 301 


Example, Table B on page 297 gives a plan for making the calculations that are 
necessary for comparing the empirical distribution constructed in the previous section 
with the normal distribution, 

Column IX of Table B shows the end points of the intervals in conventional units ¢™), 
Since the step in conventional units is equal to unity, we must subtract 0,5 from each of 
the column numbers of Table A to obtain the lower end points of the intervals and add 
0.5 to obtain the upper end points, 

In order to be able to calculate the probability that the random variable will assume 
values in each of the intervals, according to formula (11,3) of Part III, we must know the 
values of the probability integral at the lower and upper end point of each interval for 
values of the argument of the integral representing the deviations from the mean value 
divided by the mean square deviation of the normal law, Therefore, column X gives the 
values of the deviations of the_end points of the intervals from the mean value, that is, 
the numbers /‘*) —¢ (where /# =-0,450), Column XI gives the quotients resulting from 
dividing these numbers by so; (where s; = 1,036), These quotients will be the values of 
the argument for the probability integral. 

Column XII, For each of the numbers 


Mk) oF 


= 


Yt 


in column XI, we determine from the table of the probability integral (Table III at the 
end of the book) the number 


4%) _ ra \ 


o( a4 j 


lf the argument of the function * is negative, -p is taken for the absolute value of the 
argument and the number obtained is assigned the sign —, 

Column XIII gives the probabilities )°' of falling into each of the intervals, If the 
end points of any of the intervals are « and 2 in conventional units, the probability of 
falling into that interval is computed from the formula 


g 


pcre p= o(Lot) o(2aty 


Therefore, to obtain the numbers Pes we must subtract the immediately preceding 
number of the same column from each number in column XII (that is, we must calculate 
the first-order differences of the numbers in column XII), Here, we need to note the 
point in this column at which the sign changes from minus to plus, since for that interval 
we must add the absolute values of the two obtained values of the function ‘t, The proba- 
bilides (obtained in this manner) of falling in the different intervals are shown with a 
superscript (c) to emphasize that these probabilities are calculated (under the as- 
sumption that the distribution satisfactorily approximates a normal distribution law), 
As a check, the sum of the numbers in the column should be approximately equal to 
unity, 
Column XIV, In order to avoid dealing with fractions, we multiply the numbers in 
column XIII by the total number of observations; that is, we assume that the numbers 
define probabilities and we calculate the theoretical multiplicity in the intervals, which 
are denoted by ;,(c), AS a check, the sum of the numbers n,(c) must differ only 
slightly from the total number of observations n, A discrepancy of several units is 
possible because in the case of a normal theoretical distribution, the random variable 
may formally take values from —co to -| oo. and we are confining ourselves to a 
finite region, In the present example, the sum differs from n only by a unit in the re- 
serve digit, (The numbers 7,(c) are counted with tenth parts, which gives a reserve 
digit, 

Column XV, From every term in column II, we subtract the corresponding number 
in column XIV, The differences obtained are denoted by 0— ec (the observed minus the 
computed), The sum of the numbers obtained must be equal to the difference between 
the sums of the numbers in columns II and XIV, In the calculation of the numbers in 
column XV, the first and last two intervals of the table were combined because the 


theoretical multiplicities were small, which would have made more difficult the de- 


302 Mathematical Analysis of Observations 


termination of the precision coefficlent,* This column can be used for a qualitative 
comparison of the observed distribution with the corresponding normal distribution, 

In columns XVI and XVII, preparation is made for calculating the precision coef- 
ficient, which is one of the criteria of reliability of the theoreticai distribution (in the 
present problem, a normal distribution), To calculate it, we need the sum of the numbers 
in column XVII, 

Column XVIII gives the values Ni) of the theoretical distribution function, These 
values are obtained by adding the adjacent numbers in column XIV, (Thus, column XIV 
consists of the first-order differences of the numbers in column XVIII,) 


Comparison of the Empirical with the Normal Distribution 


The use of the asymmetry and the excess. This is the simplest method, It requires no 
supplementary calculations other than those made for obtaining the parameters of the 
empirical distribution, In a normal distribution, the asymmetry and the excess are both 
equal to zero, In the present problem, the asymmetry has a sufficiently small numerical 
value, but the excess cannot be considered sufficiently small, Thus, the distribution is 
approximately symmetric about its center, like a normal distribution, but close to the 
center, the probability density is greater than it is with a normal distribution, 

The use of column XV (for 0-—c), Here, we have the differences between the ob- 
served and computed multiplicities under the assumption that a normal law holds, 
Examination of these numbers confirms the conclusion obtained from the excess that 
there will be a large positive deviation about the center and small negative deviations in 
all the remaining intervals except the one adjacent to the center, where we obtaln 
~8,3, Consequently, the behavior of the numbers in the 0 —c. column is to some degree 
systematic, 

Calculation of the precision coefficient. From formula (18,15), where s’ is the num- 
ber of intervals in which the numbers 0 —c are given, we use the sum of the numbers 
in column XVII to obtain 

o.94 


~S_—1 


HI 1,2. 
Since H differs slightly from unity, a normal distribution may be considered an admis- 
sible approximation for the empirical distribution under consideration, 

Kolmogorov’s criterion. Comparing the numbers in column XVIII with the numbers 
in column VIII of the table on page 296, we find the maximum D of the absolute value of 
the difference N;, —- N\°), namely, 89, 

We calculate the value of the argument: 


D 89 
Vn Yyis9 


From the table on page 300, we find the probability that the deviation D will exceed the 
number 8 9 that we have obtained above, Specifically, P(A) = 0,78, Since the proba- 
bility of obtaining an even greater deviation than in the case that we have been examining 
is considerably greater than 1/2, the approximation by a normal law can be considered 
satisfactory, 


A= 


97. CONFIDENCE PROBABILITIES AND 
CONFIDENCE LIMITS 


The numerical characteristics of an empirical distribution con- 
sidered above (the average value, the mean square deviation, the 
coefficient of variability, the mode, the median, the asymmetry, 
the excess) are sufficient for an overall description of various dis- 
tributions, and some of them are sufficient also for comparing the 
distributions of different variables. Thus, their purpose is to re- 
place the raw material (that is, the observational data) with num- 
bers that as completely as possible characterize the entire set of 
observations. However, taken alone, these characteristics do not 


*The first two intervals above give (0)= 4 and n(c)= 4,5, Therefore, (0 c) 
becomes -0,5, An analogous situation holds in the last two intervals, 


Analysis of a One-Dimensional Statistical Set 303 


enable us to predict the possible results of future observations of 
the random variable. As was mentioned above, such a prediction 
may consist in calculating the probability that the value of the 
variable will be contained within certain limits. In practice, it is 
assumed that such a specification of these limits can be considered 
as reliable if the probability is close to unity (for example,'0.99 or 
0.999). In connection with this, we introduce the concepts of con- 
fidence probabilities and confidence limits. 

Definition: The confidence probability that a random variable 
will assume some value between specified limits is a value of this 
probability that (by agreement) will be considered sufficiently close 
to unity for the purposes of the problem. The corresponding limits 
are called confidence limits. 

If the confidence probability is closeto unity, this means that the 
event (falling within the limits) is virtually certain; that is, only 
rarely will it fail tohappen. For example, if the probability is equal 
to 0.99, the law of large numbers(J. Bernoulli’s theorem) indicates 
that when a large number of observations is made, the percentage 
of failures of the event is close to 0.01 or 1%. 

To calculate the confidence probabilities and the confidence 
limits, we must construct a theoretical distribution law by solving 
the problem of the approximation of the empirical distribution by 
means of a theoretical law chosen on some basis or other. The 
parameters of this law are determined by the method shown in 
Section 96. The distribution law that we construct (the probability 
density) must be checked against observations. With the help of 
the theoretical distribution function that we choose, we may set up 
tables for calculating the confidence probability from the given 
confidence limits and for solving the inverse problem. 

With the aid of Table II (at the end of the book), it is easy to 
obtain, for a normal distribution, a table of confidence limits for 
certain confidence probabilities (the average value x, the mean 
square deviation «) (see Table A). We also give a table of con- 
fidence probabilities for the confidence limits (see Table B). 


Table A 


Confidence 
probabilities 


Table B 


Confidence Confidence 
limits probabilities 


*& | &] Sel & 
H+ OH 
7 GO bh Fe 

a aia 


Q 


304 Mathematical Analysis of Observations 


Similar tables are constructed for the Charlier and Pearson 
distributions. 


98. GRAPHICAL REPRESENTATION OF AN 
EMPIRICAL SET 


A graphical representation is made along withthe numerical analy- 
sis of an empirical set. The graphs made are then compared with 
the accepted theoretical distribution. 

Let us first examine the graphs of a discrete distribution. 

The graph of a table of an empirical distribution is of a simple 
form (see Fig. 12). The discrete values x, are laid off along the 
horizontal axis and the corresponding values of the empirical proba- 
bilities p;,. are laid off along the vertical axis. The graph consists of 
isolated points. They can be connected by dashed lines (as shown in 
the figure) in order to make the change in p, with respect to +, 
clearer from a visual standpoint. In the intervals between the 
points representing the x; and outside the interval [-,, x, ], the 
values of p are equal to 0. 


Fig, 12, Graphs of a probability distribu= 
tion and of a distribution function of a 
discrete random variable, 


The graph of a distribution function is of the following form: 
From x= —oco tox=x,, the graph coincides with the horizontal 
axis. There is a discontinuity at «—-+,: To the left, P=0, and to 
the right, P—P,—p, up to the value x«—-.x,. At that point, there is 
another discontinuity: To the left, the value is p,, and to the right 
it is p,-+p. up to the value x= -~x,, etc. Thus, the graph consists of 


Analysis of a One-Dimensional Statistical Set 305 


a horizontal semi~axis, of a line parallel tothe horizontal axis at a 
distance p, above it between the values x, and x, of the argument, 
of another line segment parallel to the horizontal axis at a distance 
P,-+p, above it between the values x, and «x,, and so on until we 
obtain a segment between «—-+,_, and x =x, parallel to the hori-~ 
zontal axis at a distance 1 —p, above it, and finally another hori- 
zontal semi~axis at unit distance above it from x, to co. Sucha 
graph is frequently referred to as a ‘‘step graph.”’ 

In the case of a continuous distribution, the values of the random 
variable are laid off on the horizontal axis. To construct the graph 
of the distribution function at those points on the horizontal axis 
that represent the upper end points of the intervals, we construct 
ordinates of length N, or N,/n. At the lower end point of the first 
interval, we put a dot on the horizontal axis. If N, is laid off on the 
ordinates, the length of the last ordinate will be equal to the total 
number of cases. However, if the empirical probabilities are laid 
off, the length will be equal to unity. 

In contrast with the graph of a discrete variable, here the points 
must be joined, since arbitrary intermediary values of our variable 
are possible. Ordinarily, the points in question are connected by 
straight line segments. The graph consists of a broken line begin- 
ning on the horizontal axis and ending at the point whose ordinate 
is equal to nor 1. No ordinate of the broken line can be lower than 
the preceding one. A graph of this formis called an ogive (see Fig. 
13). The ogive is used for determining the median. A straight line 
parallel to the horizontal axis is drawn through the midpoint of 
the last ordinate on the right and is extended until it intersects the 
ogive and a perpendicular is dropped from the point of intersection 
to the horizontal axis. The point at which this perpendicular inter-~ 
sects the horizontal axis approximately represents the median. 


0 
Dp 


———} 
-2.0 ~1,5 -1.0 -05 00 05 £0 £5 2,0 z 


Fig, 13, The graph of an empirical distribution function, The dashed 
lines show how the median is determined graphically (see the ex- 
ample on page 296), 


To construct the graph of an empirical probability density on 
each interval and on the base, we construct a rectangle whose area 


x 


306 Mathematical Analysis of Observations 


is equal either to the number of events or to the relative frequency. 
In the first case, the area of the entire graph is equal to the num- 
ber of outcomes, and in the second case it is equal to unity. It 
follows from the method of construction that the altitude of each 
rectangle represents the average number of outcomes that take 
place per unit of the corresponding interval or the mean statistical 
probability per unit of interval. We may say that the altitude of the 
rectangle is the average density of the empirical probability that 
the value of the variable will fall in that interval. 

Graphs of such a shape are called histograms. Sometimes, in- 
stead of a histogram, we construct a polygon, which can be obtained 
from a histogram if we draw straight ine segments connecting the 
midpoints of the upper sides of adjacent rectangles (see Fig. 14). A 
graphical comparison of the empirical distribution with the theoreti- 
cal distribution can be made by means of the histograms of these 
distributions and polygons. 


Fig, 14, A histogram (solid lines) and a polygon (dashed lines) of the 

distribution of a continuous random variable, The areas of the rec- 

tangles are proportional to the probabilities of the values of the 
variable represented by the bases of the rectangles, 


Example, Let us compare the empirical with the theoretical distribution for the ex- 
ample examined in Sections 95 and 96 (see Tables A and B on pages 296 and 297, 

Let us draw the histograms of the empirical and normal distributions on a single 
diagram (Fig, 15), The empirical histogram is drawn in solid lines from the table 
of values of n, (column Il), The histogram of the normal distribution, which approxi- 
mates the given distribution, is drawn in dashed lines, using the numbers 7,(c) (column 
XIV), On such a graph, it is not necessary that the total area be equal to unity or to the 
total number of observations, since this requirement can always he satisfied by a 
suitable choice of scale on the coordinate axis, Therefore, the altitudes of the rectangles 
can be taken in proportion to the number of cases (for the observed distribution, the 
numbers in column IJ, and for the theoretical distribution, those in column X1V), 


Analysis of a One-Dimensional Statistical Set 307 


Figure 15 shows a comparison of the histograms, 

Figure 16 shows the Gaussian curve and the polygon of the frequencies, This polygon 
may be treated as the graph of the probability density of the empirical distribution, 

The polygon of the frequencies is a broken line drawn through the basic points (that 
is, the points fz, mz), Additional calculations necessary for drawing the Gaussian curve 
in a form conventent for making the comparison are given in the following tahle: 


05 


Fig, 15, Comparison of the histograms of the empirical (solid lines) and the 
theoretical (dashed lines) distributions, 


The fifth column gives the values of the probability density at 
the points shown in the first column. The sixth column gives the 
products y, of these densities and the total number of observations. 


308 Mathematical Analysis of Observations 


“475-425 -Q75— - 0.25 ne 


Fig, 16, Comparison of the polygons of the empirical (solid line) and 
the normal (dashed line) distributions, 


The last column gives the numbers y,. for comparison with them. 
These numbers are proportional tothe empirical probability density 
and are obtained by dividing the. multiplicities by the length of the 
step (namely, 0.5).* 

Both drawings confirm the conclusions that the overall agree- 
ment with the normal distribution is satisfactory. 


99. THE AVERAGE ERRORS OF THE PARAMETERS 
OF A SAMPLE SET 


As was stated above, in the majority of problems the empirical 
sets are sample sets. Therefore, the conclusions drawn from 
empirical distributions cannot be considered sufficiently reliable 
without supplementary investigation. 

On the basis of the law of large numbers, when the sample is 
sufficiently large, we may expect with a probability close to unity 
that the parameters of the sample set will be arbitrarily close to 
the parameters of the total set. Therefore, the numerical charac-~ 
teristics of a sufficiently large sample may be considered as 
approximate values of those probability parameters of the distri- 
bution that have to be calculated for the total set. For the same 
reason, those probabilities that are computed from an empirical 
distribution should be considered as approximate values of the 
probabilities which we would need to calculate from data concern- 
ing the whole set (if this were possible). 

We can get some idea of the unknown total set if we find, at 
least approximately, the mean square errors of the parameters of 


*The fifth row, corresponding to {= 0,025, is included to indicate the center of the 
theoretical distribution, In the empirical material, there is no corresponding interval, 
hence the dash, 


Analysis of a One-Dimensional Statistical Set 309 


the sample set, that is, the variance of these parameters with 
respect to the parameters of the general set. 

To evaluate these mean errors, we reason as follows. Let us 
suppose that we can consider every possible sample of the total 
set. In each sample, we would obtain some value x. (For brevity, 
we shall speak only of x though our remarks apply also to other 
distribution parameters.) The set of the sample values x repre- 
sents a certain distribution. If we derive the function of the distri- 
bution of +x, we can evaluate the probabilities of various sample 
values of x. However, when we have one sample value, we can use 
the distribution parameters to find out how much the sample mean 
value may differ from the overall one. 

Study of the sample values of x and oa, in the overall set, which 
obeys a normal law, has yielded the following conclusions: 

The law of distribution of the sample meanvalues x is a normal 
law, and the expression for the mean square error of the value of 
x is 


0 = ; (18.16) 


However, the law of distribution of the sample values of the vari- 
ance s, is not a normal law. For the mean square error s(s,) of 
the sample mean square deviation, we have the following value: 


3 (3) = "ore (18.17) 


Example, In the example examined above, 


M = -+ 0,025, oy = 0,513, n= 189. 


Therefore, 
_=> Qo13 = 0.037, 
M y189 
0.513 
(°a1) V 378 


We may summarize the contents of this chapter with the follow- 
ing procedure for analyzing a one-dimensional statistical set: 

1. An initial processing. 

2. Determination of the numerical characteristics of the em- 
pirical set (the average value, the mean square deviation, the 
asymmetry, the excess, the mode, and the median). 

3. Determination of the parameters of the chosen theoretical 
distribution by the method of moments. 

4. Construction of the theoretical distribution from the param- 
eters found. 

5. Numerical comparison of the theoretical distribution with the 
empirical distribution by use of the agreement criteria. 


310 Mathematical Analysis of Observations 


6. Construction of the graphs of the empirical and theoretical 
distributions and comparison of these. 

7. Calculation of the average errors of the parameters of the 
empirical set (if it is a sample set). 


Chapter 19 


ELEMENTARY THEORY OF THE 
CORRELATION OF TWO VARIABLES 


100. THE EMPIRICAL DISTRIBUTION OF TWO 
RANDOM VARIABLES 


Suppose that a statistical set consisting of objects (observations) 
is given and that for each of them we have obtained empirically 
numerical values for the random variables X and Y. We confine 
ourselves to the case of continuous variables. The results of the 
observations can be first written in a simple table giving the 
corresponding values of x, and y,. In what follows, we shall denote 
a table of this kind by the letter A: 

x x, Xy is Xn 

VI Ye ee Ins 
Table A is of exactly the same form as the table (the list of the 
results of observations) from which we derived an empirical for- 
mula relating xX and Y in a preceding chapter. However, at that 
time, it was assumed that the variables X¥ and Y were related by a 
functional dependence whose nature we did not know and which was 
distorted by observational errors. In the present case, we shall not 
make this assumption regarding the variables X and /Y. 

The distinctive feature of the statistical relation that we shall 
study in the present chapter consists in the fact that to each value 
of one variable there corresponds an indefinite number of values of 
the other, among which there may be both unequal and equal values. 
This characteristic can be made clearer if we make two other 
tables, which we shall denote by Bj and Bg. For this, let us choose 
different values of the variable , take them for the values of the 
argument, and assign to each of these values all the values of the 
variable Y that appear with it inTable A. We thus obtain Table By: 


(A) 


XO) Me Via ees Vary 
x) Vos Voor vee Vern (By) 
KO Vi Via eee Sry. 


311 


312 Mathematical Analysis of Observations 


In this table, «™, x@,...,+«” are different values of X. The num- 
bers with double subscripts represent the corresponding values of 
Y. 

In just the same way, we obtain Table Bg, which differs from 
Table By in that the places of the values of X and Y are reversed: 


1 
yo) Xiys KX yQn veer Nygys 


7 8 © © © © © «© © 6 se (Bg) 


(m 
ym) Xm: Xmor s+ Xmen? 


Table A is used directly for analysis of the material when the 
amount of data is quite small. We rarely construct Table B. Tables 
of that sort are needed for making clearer certain fundamental 
concepts. 

If the number of observations in Table A is great (a hundred or 
more), we construct Table C, which is the two-dimensional analogue 
of the distribution table of probabilities that we constructed in the 
preceding chapter for a single continuous variable. TableC is con- 
structed as follows. As inthecaseofa single variable, we partition 
the interval of variation of each variable into a number of equal 
subintervals and we count the number oftimes that X and Y assume 
values in each of the pairs of corresponding intervals: 


| xX | | || Distri- 
| bution 
Xi} - , 
aN eee ok “rl of Y_ 
en oq 
1 | Ait Noy Asy Mry Nor 
a Po 
Yo 4| Nia Nowy | ne | | pe | Nos 
| | f ‘ 
— , a rs 
| 
| | ! 
POV DIT DTP (C) 
| > 
yy | may Ny) | oy ° My) Nyy 
| { _ 
| | 
| | | 
| | oo 
1 | | | 
Va Me | Moy | vas Nhs Ms Po 
4 _ 
Distri- 7 | n 
| bution N10 Nag . Nko Nro 
of X a 


The first row of the table contains values of X corresponding to 
the midpoints of the intervals of that variable. (The end points of 
the intervals are not written down.) The first column contains the 
midpoints of the intervals of the variable Y. The quantities with 
double subscripts are the multiplicities of the part of the total set 
having values of x and y in intervals corresponding to the rectangle 
in which the number is written. If the steps of the variables ¥ and 
Y are equal to g and A, respectively, the number n,, is the number 


Elementary Theory of the Correlation of Two Variables 313 


of observations in which X takes values from x, — 5 gto x, += g 
and Y at the same time takes values from y, — > h to y, +5 h. The 


last column is obtained by adding all the numbers 1, in each row. 
The resulting column gives the distribution of the variable y for 
arbitrary values of X. The last row is obtained by adding the num- 
bers n,; in the individual columns. It represents the distribution of 
the variable X. The sum of all the numbers 12,, (fork +0, 1+ 0) 
must be equal to the number of observations n. 

Tables of this kind are called correlational tables (or joint 
tables). Setting up such a table is not recommended when the 
number of events is small, since it is assumed in the analysis 
that the number of events in each rectangle refers to the midpoints 
of the intervals and this may give appreciable errors when the num- 
ber of observations is small. 

The empirical distribution of two continuous variables can be 
represented graphically. We take the numbers x andy inTable A as 
rectangular plane coordinates and we plot the points with coordi- 
nateS xn, Ym (for m= 1, 2,...,n). We then obtain a set of points 
scattered over the plane. This set is knownas the field of correla- 
tion. 

Let us find the average values x and y of each of the variables 
separately. The point in the field with coordinates x and y is 
called the center of distribution. When a field of correlation has 
been set up, it is easy to make a correlational table like Table C. 
For this it is sufficient to construct a coordinate net passing 
through the points that determine the end points of the intervals. 
The plane is partitioned into rectangles. If we count the number of 
points in each rectangle, we obtain a correlational table. 

On each rectangle we construct parallelepipeds of height such 
that the volume of each parallelepiped is equal to the ratio of the 
number of events in the rectangle to the total number of events. 
The three-dimensional step figure thus obtained may be called the 
empirical surface of the distribution or the histogram of the two- 
dimensional problem. This is a natural generalization of the 
method of graphical representation of the distribution of a single 
variable, as described above. 


101. CORRELATIONAL DEPENDENCE. PROBLEMS IN THE 
THEORY OF CORRELATION 


The relationship between X and Y defined by their distribution is 
usually quite different from what we call a functional relationship 
between two variables. In the case of a one-to-one functional de- 
pendence, to each value of x« there corresponds one and only one 
value of y and vice versa. It is clear from Tables Bj and B2 that 
to one value of « there may correspond any number of values of y 
and to one value of y there may correspond several values of x. 
Also, they may be quite different one from the other, The alterna- 
tion in the number of these different values is such that it is not 
convenient to speak of a multiple-valued functional dependence, 


314 Mathematical Analysis of Observations 


Examination of the correlational Table C also shows a deviation 
from a functional dependence. When the values of one of the vari-~ 
ables lie in some interval, the other variable may take values be~ 
tween rather wide bounds. This characteristic becomes more 
obvious when we consider the field of correlation. 

In the case of an exact functional relationship between .¥ and Y, 
the points in the field must lie along some quite definite curve. If 
there is an exact functional relationship between X and /Y, but the 
numbers in Table A are obtained from observations, the presence 
of random errors will still keep the points of the field from lying 
precisely on the theoretical curve. Nonetheless, if the errors are 
small, the points must lie insomenarrow strip along the theoretical 
curve. ln the case of a correlational dependence, however, the 
points of the field are located more or less at random in the plane. 

To study the pecularities of a correlational connection, we 
supplement Table B with another column giving the average values 
of the variable Y for each of the values of .¥ that appear in the first 
column. The first and last columns of Table Bj, in its extended 
form determine the dependence of the average values v, of the 
variable Y on the corresponding values of the variable ¥. We ex- 
tend Table B2 in the same manner and obtain the empirical de- 
pendence of the average values x, of the variable ¥ on the corre- 
sponding values of the variable Y. The two tables (v. ¥,). and 
(vy. x,) , respectively, are said to define the empirical regression of 
Y with respect to .V and of X with respect to rY. 

Examination of similiar tables leads to the following con- 
clusion. When the variable x increases, the corresponding values 
of »y in the individual observations may bothincrease and decrease. 
To a single value of « there may correspond both large and small 
values of v. The same statement holds if we reverse the letters + 
and y. Even so, in many problems, the average values of a single 
variable corresponding to the values of the other disclose a certain 
dependence on the values of the second variable. Sometimes we 
may even speak of a quasi-functional dependence of the average 
values of one variable on the corresponding values of the other. 

In those cases in which the distribution of the two variables 
exhibits these features, we say that there is a correlational de- 
pendence, or simply a correlation, between the variables. Thus, 
there is a correlation between two random variables if to every 
value of one there corresponds an indefinite number of values of 
the other, but the averages of these values depend on the values of 
the first variable. 

A one-to-one functional relationship between the variables 
and ¥Y can be considered as a special case of correlational de~ 
pendence. lf to each value of x there corresponds only one well- 
defined value of y and vice versa and if all points of the field are 
located on the curve no matter how greatly we increase the number 
of them, a correlational dependence becomes a functional de- 
pendence. 

Construction of fields of correlation for different pairs of 
random variables yields varied results. Sometimes, the points of 


Elementary Theory of the Correlation of Two Variables 315 


the field are scattered at random andat other times they are almost 
all located along some imaginary curve or straight line. Thus, the 
correlational dependence can to a greater or lesser degree deviate 
from a functional dependence. In connection with this, the first 
problem in the theory of correlation is the derivation of a numeri- 
cal criterion for evaluating the degree of closeness of a correla-~ 
tional dependence to a functional dependence. If out of all the ob- 
servations we pick only two quantities, there can of course be no 
functional relationship between them since these quantities are 
related with others that we have not taken into consideration. In 
applications, it is important for us to know how to choose those 
quantities that are most closely related with eachother. Specifically, 
a numerical criterion for the degree of connectionis introduced for 
solving such problems. 

It was shown in the preceding section that the average values of 
one variable that are calculated for a finite interval of the other 
disclosed a dependence on the values of the second variable. There- 
fore, the second problem to be considered inthe theory of correla- 
tion consists in deriving empirical formulas for determining the 
average values of one variable from the values of the other, There 
are two such formulas. They arecalledthe equations of regression. 

We shall confine ourselves to the theory of linear correlation. 
This means, first, that we shall derive a criterion for the degree 
of deviation of the correlational dependence from a linear func- 
tional dependence and, second, that we shall derive linear empirical 
formulas of the form 
c-tdy (19.1) 


Vp mat Ox, Xy = 


for determining the average value of each variable from the values 
of the other variable. 


102. DERIVATION OF A LINEAR EMPIRICAL FORMULA 


To solve the problem of deriving the equations of regression, we 
first solve the problem of setting up a linear empirical formula in 
a form that will be convenient for obtaining the lines of regression. 

Suppose that we obtain from observations a table like Table A 
giving the values of x, and the values of y, (for k= 1, 2,...,27) 
corresponding to them. We need to construct an empirical formula 
of the form 


According to Section 90, to determine the coefficients a and b, we 
obtain the conditional equations 


a+ bx,—y,=0 (R=1, 2,..., 7). (19.3) 


316 Mathematical Analysis of Observations 


From these, we set up two normal equations whose coefficients it 
is convenient to write not in Gauss’ notation but in the form of the 
gums 


na+b Dix — > vy, =0, (19.4) 
=! k=l 
a xe +b Di xe 2 XSi = 0. (19.5) 


Let us now suppose that all the x, and y, are equally probable. 
Then, if we divide >) x, by n, we get the average value of x and if 
we divide >) y, by 2, we get the average value of y. Therefore, the 


normal equation (19.4), after division by n, can be written in the 
form 


atbx—y=0. (19.6) 
To transform equation (19.5), we also divide it by n. The coef- 


ficient 
( s 4) in 
k=1 


can be replaced with the expression o,+ x’, which is formally ob- 
tained if we determine the variance by the following formula. The 
variance is equal to the difference between the expectation of the 
square of a random variable and the square of its expectation. To 
transform the last term in the equation (19.5), we introduce the 
notation 


1X -- 
=F y Xp Mir yy == Vy — ¥Y- 
kal 
If we generalize the concept of moments of distribution to the set 
of two random variables and consider all pairs (x,, y,) as equally 
probable, we may call v,, the initial moment of order ‘‘one-one’’ of 
the distribution of (xX, Y), and p»,, the central moment of the same 
order. 
After these changes are made, the second equation becomes 


ax +6(o, +-x°) —(xy +44) =0. 
Let us rewrite it as 
ax + bx” — xy + ba‘, -— 4, = 0. 


The first three terms are equal to zero because of (19.6) and we 
obtain the following system of equations: 


atobx=y, 


2 
Ody == Py). | 


(19.7) 


Elementary Theory of the Correlation of Two Variables 317 


Therefore, 


(19.8) 


Q 
| 
Ye | 
—- 


The linear empirical formula (19.6) derived from the table of 
(x,, Y,)Can now be written in the form 


YY SX 2) (19.9) 


Its parameters can be calculated from the formulas 


1 rn n 
x= Me y= 
k-1 k=] 
i A ) (19.10) 
2 2 —2 2 —- 
op =— ) xy— x yy =— Wy, — 9 
kent k= 


It was assumed in the derivation that all the pairs(x,, y,) are 
equally probable. This is equivalent to the assumption that every 
pair occurs exactly once in the table of values. If each pair of 
numbers (x, ¥,) is encountered 7, times in the table and if the num- 
ber of different pairs is equal to s, the probability of each pair and 
of each of the numbers taken separately must be considered equal 
to n,/n. Therefore, the above equations are replaced with the 
following ones: 


s tt 
! -_ 1” 
x= — 2 My Xk» y= 7 X My X he 
| ‘ - (19.11) 
o, = >» Ny Xk — y » Pui => > NyXy Vu — X Y- 


These formulas may be called the ‘‘weight equations’’ since the 
numbers n, play the role of weights in the calculations. 

We have constructed an empirical linear formula expressing Y 
in terms of X. Of course, Xand Y may represent different physical 
quantities, but from a computational point of view, they are com- 
pletely equivalent. Therefore, besides the empirical formula that 
we have constructed, we may also construct an empirical formula 
expressing X in terms of Y. Obviously, it will be a consequence of 
the first formula only if X and Y are related by an exact linear 
dependence. Otherwise, the second formula will not be a conse- 
quence of the first, but will have an independent significance. We 
shall not derive the second formula (it is recommended that the 


318 Mathematical Analysis of Observations 


reader do this himself) and shall give only its final form for the 
case of equal probability of (+,, ¥,): 


Xe (Y — y), (19.12) 
where 


ye — y’; (19.13) 


The remaining parameters have the same expressions as in the 
preceding case. Here, the quantity »,, does not change since it is 
symmetric with respect to X and Y. 

If the pairs of numbers (x,, y,) appear n, times, the form of 
the formula does not change and the parameters are calculated 
from the weight formulas. Here, the formula for of undergoes the 
obvious change. 


103. DERIVATION OF THE LINEAR EQUATIONS 
OF REGRESSION 


In the preceding section, we derived formulas for determining the 
coefficients in the linear empirical formulas 


Y=a+bX and X=c+ay. 


We obtained the equations for the straight lines 


yY—y =X — x) and X—x = Sh (Y — y): (19.14) 


x y 


It is clear from these equations that both lines pass through the 
center of distribution. 

We need not emphasize the fact that we derived these formulas 
by using Table A because, if necessary, we could obtain the em- 
pirical formulas for expressing Y in terms of X and vice versa. 
In the meantime, by definition, the equations of regression must 
be empirical formulas for determining the average value of y as a 
function of x and the average value of x as a function of y. Starting 
with the determination of the equations of regression, let us show 
that these equations are of the same form as the empirical for- 
mula (19.14). Let us set up tables like Table By and Table Bg. In 
each row, we calculate the average values of Y and X and denote 
them respectively by y, and x,. 

We must set up the equations of regression for these averages 
and the values of X and Y corresponding to them by using the for- 
mulas of the preceding section. However, in so doing, we need to 
take into consideration the fact that if, for example, three values 


Elementary Theory of the Correlation of Two Variables 319 
of y correspond to some value x,,, the pair of numbers (¥,,, Vo. m) 
obtained after determining the average value y must be considered 
equivalent to three pairs of numbers and we must determine the 
average values from the weight formulas. Suppose that Table B1 
(extended as described earlier) is of the form 


(By) 


In this table, / is the number of different values of x that appear in 
Table A and +r, 7. ..., 17 represent the number of values of y 
corresponding to the quantities x, x@,..., «“. Obviously, 


retry wee orypen, Len. 


Thus, we need to construct an empirical formula for the table 


The numbers yr, yoo. --- 
mulas 


— Yat Ye ee My ~ Yay TH Yay oe oP Yer, 
Yui = ry ’ Vn. Lo ys 


The weighted average value y will be 


~ Yea tray na t vee Heyer _ 
¥ Fiero +h 


ie Pin 2 


— __ 
—- 


fl 


Since the numerator of the last fraction is simply the sum of all 
possible values of y, the weighted mean is equal to the ordinary 
average, which is obtained from Table A. Ananalogous phenomenon 
holds for x. 

Furthermore, from the weight formula, 


t ry (a) + r(x to +r GO)” — 2 
Pn - 


ritlet ieee TN 


320 Mathematical Analysis of Observations 


But from such a formula we would need to calculate the simple 
mean square deviation of X from Table A, since the number of 
values x in it is r,, etc. To determine p,, from Table Bj, we 
would need to write 


1 Yorn Hoe yin + ee rx yo) — xy: 
" Petlet oe Te 


If we remember the expression for y.,, yoo, ---., then 


Pa 
My te xeDy totaly + oxy. 4b xy. t.. + xy 
= TY, 
n 
The numerator again contains the sum of the products of all values 
of x (including the repeated ones) multiplied by the corresponding 
values of y; that is, p,, would have to be calculated just as for 
Table A. 

Thus, when we set up the empirical formula for determining Y 
from X , we obtain the same result as when we set up the equations 
of regression for y, as a function of x. 

Thus, the equations of regression are of the form 


Ya— Y= “F(X —X), Xy— x= “2 (Y — 9). (19.15) 
D y 

For simplicity, we often replace y, and x, with y and x, respectively. 

As was just shown, this involves no error. The numbers 

Ba Bu 


i a (19.16) 


®D 


are called the coefficients of regression. 


104. THE CORRELATION COEFFICIENT 
To make the following derivations simpler, let us assume that the 
center of distribution (x, y) has been determined and that all the 
numbers x and y have been replaced with their deviations from 
their mean values. We denote these deviations by wand v, The 
numbers u and v are related to x and y by the obvious equations 
u==X—X, US y—y, (19.17) 


If we change from x and y to u and v, the equations of regression 
(19.15) become 


Vy d uy Av, (19.18) 
Uu t 


For brevity, we may write v and u to represent the left sides of 
these equations, respectively. Since, from (19.17), 


x=uty, y=uty 


Elementary Theory of the Correlation of Two Variables 321 


and since x and y are constants, we have 


u=v=0, =, Oy = Oy, (19.19) 


The number »,, and the variances o,,and o, can be calculated from 
the formulas 


bi = 2 (19.20) 


(19.21) 


Q 
= 
~ 
1 
P| — 
ES eee 


It is convenient to make calculations from these formulas by using 
the table directly if » is small. If n is large, it is better to use 
(19.11). 

For simplicity, we shall assume the deviations from the aver- 
ages, that is, the numbers 4, and v,, as given. 

We have obtained two straight lines of regression, which, as we 
stated, pass through the center of distribution. Their directions 
are determined by the coefficients of regression. The first is the 
tangent of the angle formed by the line of regression v (expressed 
in terms of u«) and the w-axis. The second is the tangent of the 
angle between the line of regression u(expressedin terms of v) and 
the «-axis, since in this equation u and v have switched positions. 
We denote these angles by a and 6 (Fig. 17). 


Fig, 17, The lines of regression, 


The coefficients of regression may be both positive (»,,> 0) or 
both negative (»,,< 0). The two lines of regression generally do 


Y 


322 Mathematical Analysis of Observations 


not coincide. If the correlational dependence becomes a perfectly 
linear functional relation, the two lines of regression must then 
coincide, because in that case it does not matter whether v is ex- 
pressed in terms of uw or vice versa. In the case of coinciding lines 
of regression, 


tan a tan § = 1 


because then 


a+ P= 5 (py > 0). 


If there is no connection between « and v, on the average, v,, will 
change only slightly when uw changes and vice versa. In this case, 
a and 6 are close to zero and in the limit, tan 2 tan g = 0. The 
square root of the number 


(19.22) 


is taken for the degree of closeness of the correlational dependence 
to a linear functional dependence. 

Definition. The linear correlation coefficient of two random 
variables x« and y is the number r defined by 


ro (19,23) 


Sy Ty 


where uw and wv are the deviations of x and vy from their average 
values and 


Dy tnt (19,24) 


It is clear from formula (19.23) that r is a dimensionless 
quantity; that is, it does not dependonthe units in which the quanti- 
ties that we are studying are being measured. It is also independent 
of the coordinate origin, since only the central moments appear in 
the expression. 

It is clear from (19.16) and (19.23) that the correlation coef- 
ficient is equal to the square root of the product of the coefficients 
of regression. If we solve for p,, in (19.23), we may make the 
substitution 


Pay = FO, Sy; 


Elementary Theory of the Correlation of Two Variables 323 


in the equations of regression (19.18). These then become 
Vy, =— ru, ly == rv. (19.25) 


As we have seen, the correlational dependence becomes a 
linear functional relationship if tan « tan 8=— 1, thatis, if r=+1 
(the sign depending on the sign of »,,). _ 

On the other hand, if we assume, for example, that the points in 
the field of correlation are pairwise symmetric about the coordi- 
nate axis, the lines of regression will coincide with the coordinate 
axis and , will vanish. In this case, we may consider u and v 
independent. 

We obtain all gradations between complete independence and a 
linear functional dependence when the sum «+8 varies from 0 to 
<« /2 with p,,> 0. Here, r varies monotonically from 0 to + 1. (We 
exclude the theoretically possible case of a— 0, 8 «/2 since this 
case is of no significance in practice.) 

If »,, is positive, v will increase on the average with increasing 
u and vice versa. In this case, the choice of the formula for r 
yields r> 0, which is somewhat convenient. 

The number p,, in the formula for , can be calculated in various 
ways. From (19.24), 


U 
a 

P11 = 7 ») L,Up, 
kel 


Here, uw, and v, are the deviations from the averages. In the 
majority of cases, it is not convenient to use the deviations from 
the averages. Therefore, we use formulas containing x; and y,: 


1X = 
bi = > S) nn — EY, (19.26) 
kod 


if Table A is being compiled. However, this should be done only 
when “is small. 

If we compile the correlational Table C, let us agree to relate 
all the cases shown in each rectangle to the midpoints of the in- 
tervals x and y, just as in the case of the distribution of a single 
variable. 

Let us recall the notations used above: x, and », are the values 
of « and y corresponding to the midpoints of the intervals of the 
variables x and y; 7, is the number of intervals of the variable x; 
s is the number of the intervals of the variable y; n,, is the number 
of outcomes in a rectangle corresponding to x, and y,. Then, to 
calculate p,,, we may write the formula 


1 r & __ 
by) => » Sy aX d1— *)- (19.27) 


k=1lq—l1 


324 Mathematical Analysis of Observations 


The numbers x, y, 9,, and o, are calculated fromthe rules for de- 
termining the characteristics of the distribution of a single 
variable: 


*=7 »> MX Y= ») Noy» 
k=1 1 =1 
. (19.28) 
1 
02 = — Yi moxie — x2, =F rast — ¥ 


105. THE AVERAGE ERRORS IN THE EQUATIONS OF 
REGRESSION. BOUNDS FOR THE VALUES OF THE 
CORRELATION COEFFICIENT 


If we use the equation of regression for determining y from a given 
value of x (or vice versa), the computed value y, will differ from 
each of the values of y that in actuality correspond to the number 
x in the observations. 

We may say that when we substitute x into the equation of re- 
gression for calculating y, we will obtain y, with an error (more 
precisely, with a deviation from observations). The question 
naturally arises as to the value of the average ‘‘error’’ in the 
equation of regression, so that we may judge the extent to which the 
points of the field are dispersed about the line of regression. In 
determining the average error (that is, the mean square deviation), 
it is convenient to deal not with the values of the variable, but with 
the deviations of these values from the average values. Therefore, 
we write the equation of regression (19.18) in the form (19.25): 


og a 
v=— ru, ur. 
Oa, Sy 


Here, we drop the macrons and the subscripts from the left side of 
each of these equations, since we are interested only in comparing 
Uc Calculated from the equation of regression in order to deter- 
mine uw, with all the values v obtained from observations (Table A). 

We denote the mean square errors inthe equations of regression 
by s, and s,. By definition, 


n > nr n 9 nr 
2 a Pm Gg 
5 __ Cy __ 9 v \) v9 9 
ns‘= U, — —SPil —\ ve —2Q4 ul — fr , 2 
v d ( kG “) ea eg TAT 2 
kel koi kel ub kel 


Since 4, and vu, are the deviations from the average values, 


> u? = no?, (19.29) 


NM __ __ 
x UpUp = Mids et nro,,s,. 


Elementary Theory of the Correlation of Two Variables 325 


Therefore, 
ss=o2(1—r%), 8, =o] 77, (19.30) 
In an analogous manner, we obtain 
$,=9,V 1 —r?, (19.31) 


From formulas (19.30), we may draw the following conclusions. 
If we use the equation of regression for determining the value of 
one variable from the value of the other, the average error in this 
determination is less than the average error in the first variable, 
provided its value is replaced with the average value obtained from 
the entire distribution. 

Formulas (19.30) and (19.31) enable us to determine exactly 
the limits between which r can vary. The quantity s? is a positive 
number. Therefore, it follows from the formula for s?,that P< 1 
or 


—lir<t. (19.32) 


We may obtain a similiar conclusion regarding s?.. 

The correlation coefficient r can be equal to -lorl only if 
s*== 0 or si;= 0. The quantity 1s? is the sum of the squares of the 
deviations and it can be equal to 0 only when each deviation is 
equal to 0. The same is true of ns?. Thus, if r+ 1, this means 
that the equations of regression give the exact values of v interms 
of uw and vice versa. This in turn means that we have a linear 
functional relationship. 

We note in particular that the correlation coefficient should 
indicate the degree of connection between two random variables. 
It should be clearly understood that this number alone is not 
sufficient to characterize the distribution in this respect.* There- 
fore, we should be careful in forming conclusions based on the 
value of the correlation coefficient. If r~ 0, this still does not 
mean that there is no relationship. The relationship may be close 
to a nonlinear one, in which the value of r plays no role. 


106. AVERAGE ERRORS OF SAMPLE COEFFICIENTS OF 
CORRELATION AND REGRESSION 


The concept of a general set can be extended to the case in which 
we study the distribution of several random variables (in particu~ 
lar, two). For example, if we are interested in the correlation 


*In practice, we usually assume that the variables are sufficiently related if |r| > 0.7 
or 0,6, However, we may conclude that there is a relationship even when r is less than 
this, if physical considerations substantiate this view, On the other hand, even when 
|r| = 0.9, one should not categorically speak of a dependence if it cannot be explained 
on a physical basis, 


326 Mathematical Analysis of Observations 


between the visible magnitude and the distance of stars of a certain 
spectral class, we should consider all those observations that we 
might theoretically perform (by observing all stars) as belonging 
to the general set. In the study of the relationship between the 
height of men called into the army and their weight, the general 
set consists of the total data on the height and weight of all per- 
sons examined. In practice, the study of the general set either is 
too tedious, as in the Second example, oris in practice impossible, 
as in the first example, where the concept of the general set is 
only of a theoretical nature. Ordinarily, we deal with a sample 
taken from the general set. Thus, for example, four standard ob- 
servations of temperature and humidity during a year represent a 
choice from the theoretical general set of these quantities. 

However, it is of interest to form a picture of the nature of the 
general set on the basis of the sample. To do this, we need to find 
out to what extent the sample correlation coefficient and the sample 
coefficients of regression represent the same quantities in the 
general set. 

On the basis of the law of large numbers, we can only say that 
for a sufficiently large sample, the sample parameters can differ 
by an arbitrarily small amount from the parameters of the general 
set and that we can expect this with a probability close to unity. 

Pearson investigated the distribution of the sample values of r 
in a general set obeying a normal law of distribution (see Chapter 
12). The law of distribution turned out to be rather complicated. 
The distribution of the sample correlation coefficients differs 
from a normal! distribution and approximates it only if the sample 
is large (but here, the value of the correlation coefficient is not 
Close to unity). From the distribution function, we may derive an 
approximate expression for the average error of the sample 
correlation coefficient: 


co, —- —>= (19.33) 


where nv is the size of the sample (the number of observations). 

From what was said concerning the distribution of the sample 
values of r, we can assign the variable so, the same value as ina 
normal distribution if the sample is large. In the opposite cage, 
so, approximates the possible bounds for the general coefficient 
only very crudely (for example, according tothe three~sigma rule). 

For the average errors of the sample coefficients of regression, 
we obtain the following approximate expressions: 


og l—r? 
OV (19.34) 
5 (Pay) = -2 pier 


x 
Ty n J 


for the regression of Y with respect to X and X with respect to Y, 
respectively. 


Elementary Theory of the Correlation of Two Variables 327 


If the sample is very small, calculation of the average errors 
from these formulas gives virtually no results. In such cases, it 
is necessary to estimate the correlation from the theory of a small 
Sample by using Fischer’s investigations. A very clear exposition 
of this problem can be found in the book by V. I. Romanovskii 
Elementarnyt kurs matematicheskoi statistiki (Elementary Course 
in Mathematical Statistics). 


107. THE PROBABILISTIC SIGNIFICANCE OF THE 
ELEMENTARY THEORY OF CORRELATION 


In the preceding sections of this chapter, we constructed linear empirical formulas ap- 
proximately representing the dependence of the average values of each of the variables 
on the corresponding values of the other variables, We also introduced the correlation 
coefficient as a measure of the deviation of the relationship between the two variables 
from a linear relationship, 

We have discussed the problem of studying statistical relationships between two 
variables on the basis of observed pairs of their values, without considering the rela- 
tionship of this approach to probability theory, We must show the relationship between 
Chapter 12 (dealing with sets of two random variables) and the present chapter in order 
to determine the conditions under which we may apply the computing procedures that we 
have now developed, 

The apparatus of the theory of correlation that we have developed gives five numbers 
that serve as overall characteristics of the sample set being considered: the average 
values of each of the variables, the mean square deviations of each variable in partic- 
ular, and the correlation coefficient, Under what conditions are these five numbers 
sufficient to describe the sample set and to determine approximately the parameters of 
the general set from which the sample was taken? 

In Chapter 12, we examined a normal distribution of a set of two random variables, 
It was shown there that the conditional expectation of each variable (when the value of 
the other variable was given) is a linear function of the second variable, This corre- 
sponds to the condition taken in the present chapter of constructing linear equations of 
regression, 

The coefficient of correlation 7, which was introduced earlier in the chapter, can be 
formally defined as the quotient that results when we divide the average value of the 
products of the deviations of both variables from their average values by the product of 
their mean square deviations, 

As was shown in Section 64, the parameter # of a two-dimensional normal law is 
equal to the quotient that results when we divide the expectation of the product of the 
deviations of the variables from their expectations by the product of the mean square 
deviations of these variables, If we remember that the expectation of a random variable 
is its average value, we obtain a complete correspondence between the numbers r andk, 
We may say that 7, is an approximate value of / obtained from the sample set, if we 
assume that the general set obeys a normal law of distribution, 

From what has been said about the equations of regression and the correlation co- 
efficient, it follows that study of a linear correlation is admissible if the variables in 
question have a normal distribution, The parameters of the distribution (+, y, 32, 5), r) 
can then be considered sufficient to describe the set of two variables that we are study- 
ing (provided we can assume the distribution to be normal), 

Calculation of the mean square errors in these parameters is necessary, since this 
makes it possible for us to evaluate the reliability of the values of the parameters after 
they are calculated, 

lf our results can be considered reliable, we may write the probability density with 
the approximate values of the parameters and we may calculate the probability that the 
values of the variables will belong to some given region by integrating the density over 
that region, 


108. THE PROCEDURE FOR INVESTIGATING THE 
CORRELATION IN THE CASE OF A LARGE 
NUMBER OF OBSERVATIONS. 

AN EXAMPLE 


lf there are no more than about fifty observations, the parameters can be calculated 
from the formulas given in Sections 103 and 104, 


WOISSIIS aI FO 
*sby uloy © 7 


Od ck Geka Cake 
ve eeefire: foro [weliee[ere[ ww [i 


yarn [wro-fo | aw fo fo | 
astm |sctt | s+ | esr | ee | ee =| ce |e 


a 
ed dd 
Pada a 
mesa [a ape fe [es [on 
VAG aS Te TP lla 
cers jose { s+ | e+ [or fo + ao 
pe fe | 
a70= [900 or | OF Par [aa Let ef | te 
ps 
a 
tS 


Sp) 

a 

ba | 

+ 

00 

xo 

vv 

00 
° DH al 4) 
” in 


“~~ 
er) 
wee 


Elementary Theory of the Correlation of Two Variables 329 


Let us now examine the method of calculating the correlation coefficient and of set- 
ting up the equations of regression when there is a large amount of observational 
material, 

Let us calculate the correlation coefficient and derive the equations of regression 
for visible stars of magnitudes m and the logarithms of the distances in parsecs P of 
the stars of the spectral type B5-B9, 

The calculations should be arranged as shown In the diagram on page 328, 


Description and Explanation of the Diagram 


Column I glves the values of the variable P at the midpoints of its intervals, Since the 
step ls equal to 0,2, the first interval contains stars with values of “from L,0 to L2; the 
second contains those from 12 to 14, etc, Row (1) glves the values m of the variable 
at the midpoints of the intervals, Here, the step is equal to unity, Therefore, the first 
Interval contains those stars whose vislble size Iies between | and 2, the second con- 
tains those from 2 to 3, etc, Column II contains the values of the midpoints of the inter- 
vals of the variable P in conventional units as measured from the zero point agreed on, 
Row (2) contains the same values for the variable m, Both these rows are filled out 
after calculating column V and row (5) just as In the case of a one-dimenslonal distri- 
bution, 

The numerical matrix, denoted by III-(3), is a correlational table of type C, It is ob- 
tained by calculating from a catalogue the number of stars whose visible brightness and 
whose values of P lie in the Intervals corresponding to each rectangle In the diagram, 
The sum of all the elements in the matrix must be equal to the total number of observa- 
tlons, The correlational table is surrounded by heavy lines, Directly to the right of it is 
column IV and then column VY, Column IV is a repetition of column II and consequently ls 
filled out after column V, The diagram can be simplified somewhat by elimlnating elther 
column II or column IV, It is more convenient to keep column IV next to column V, 
Column V is obtained by adding all the numbers In every row of the matrix II Column 
V and column | together show the distribution of the single variable P, These columns 
together with columns II and IV give the distribution of the same variable in conventional 
unlts (that 1s, of the variable Y, as is indicated by the column heads In the diagram), The 
same may be said with regard to rows (4) and (5) with the difference that we must speak 
of the variable m or_X, 

Calculations of the numbers in column V and row (5) can be checked by adding all the 
numbers in the line, The sum must be equal to the total number of observations, There- 
fore, the number 376 appears twice in the diagram—once in column V and once In the 
continuation of row (5), The places, such as here, where we need to perform this addition 
are indicated by the word ‘‘sum,’’ A diagonal line ls drawn across those rectangles 
where we do not need to take the sum, 

After computing column V, we must choose our zero point for the variable P, The 
maximum of the distribution of this variable, that 1s, the largest value of the multiplicity 
(of the numbers 7%; appears in that interval In which the midpoint has the value 2,1, We 
could take our zero point here, but the preceding value of mp; is rather large, There are 
also several lower values in the table, Therefore, it would be better to take our zero 
point at the midpoint of the interval in which mo, is equal to 109,* For the variable m, we 
take the zero point at the midpoint of the interval In which 7;, has its maximum, 

Columns IV and V and rows (4) and (5), which adjoin the matrix III, glve the distribu- 
tlons of the variables Y and X in question taken separately, Therefore, they are bounded 
by heavy lines, 

Column VI gives the figure we need for calculating the average value of the variable 
P In conventional units (that Is, of the variable Y), Therefore, the column Is labelled 
Mo, ¥, 3 that Is, every number in column V must be multiplied by the number next to lt 
in column IV, The numbers obtained are then added, A completely analogous operation 
ls done in the case of row (6), 

In column VII, the preparatory calculations are made for determining the second- 
order initial moment of the variable Y, This ls necessary for calculating the mean 
square deviation of that varlable, The numbers in column VII are obtained by multi-+ 
plying the numbers In column VI by the numbers in column IV, Analogous operations 
are performed In row (7), They glve the sum that ls necessary for calculating the mean 
square deviation of the variable m, Columns VI and VII and rows (6) and (7) contain the 
final result for one-dimenslonal distributions of the variables P and m, 


*As computations show, such a choice Is not altogether fellcitous, but this ls of no 
great significance because if the point taken as the zero point ls poorly chosen, the 
calculations are made only slightly more compllcated, 


330 Mathematical Analysis of Observations 


To calculate the correlation coefficient from a table of type C and to calculate the 
coefficients of regression, we must evaluate the double sum 


Y » NkKIXkYL = > Re nari) 

kl l k 

This could be calculated directly by multiplying every number in the matrix III by the 
values of x, and y, corresponding to this number and adding all the elements of the 
matrix thus formed, Since in so doing we would need to write down the transformed 
correlational table and since there would not be acheck, we usually prefer another 
method for making the calculations, The summation is made first over one argument 
and then over the other, By reversing the orders of the arguments, we can obtain the 
double sum a second time, Since all the operations are carried out in a formally exact 
manner (that is, with no approximations), the second sum must be exactly equal to the 
first and this ensures a reliable check on the calculations, Let us examine in detail the 
calculations of columns VIII and IX, which give the double sum, As indicated in the 
column head of column VIII, the numbers ny, in each row are muluplied by the corre- 
sponding values of x; and the resulting products are added, These operations may be 
done in one’s head or, if the numbers nx; are great, on a calculating machine, In the 
present problem, all these operations can be done in one’s head, The first row is 


2x (—4) z= 8; 
the second row is 


2x (-4) + 4x (-3)=-20,...; 


the sixth row is 


4x (-2) + 16 x (- 1) + 143 x (0) + 14x 1=-10, 


After multiplying each of the numbers in column VIII by y,, we obtain column LX, Ad- 
dition of the numbers in column IX gives the double sum above, In just the same way, 
rows (8) and (9) give the same sum, but the summation is made first over y and then over 
x, For example, the number 49 in row (8) is obtained thus; 


1x04 14x1+16x24+1x3= 49, 


If we multiply by +1, we obtain the result of summing over y when + = 1, Addition of 
all the numbers in the ninth row completes the calculation of the double sum carried out 
first over y and then over x, The exact same number should be obtained in both cases, 
Columns VIII and IX and rows (8) and (9), together with the sixth and seventh lines give 
all the necessary parameters of the empirical distribution of the set of two variables 
(under the condition that it is sufficient to examine the linear equations of regression), 
With this, we may conclude the work from the diagram and turn to the calculation of the 
coefficients of regression, to the derivation of the equations of regression, and to the 
calculation of the correlation coefficient, The tenth and eleventh lines in the diagram 
are filled out when it is necessary to compare the empirical with the theoretical (linear) 
regression constructed by them, As indicated in the heading of the tenth lines, every 
number in the eighth line must be divided by the corresponding number in the fifth line, 
Since, for example, , is the number of stars corresponding to a given value of jy), 
the ratio ny; / 9, is the empirical probability of every x, under the same conditions, 
Therefore, the numbers in column X represent the weighted means of the value x, for 
consecutive values of y, This means that we have obtained the empirical regression 
of + with respect to Y, Analogously, row (10) gives the sequence of average values of 
y for given values of +, that is, the empirical regression of Y with respect to X, It 
should be recalled that in making these calculations we assume that we haven,, stars 
for which the values of the variables are equal to x, and y;, The eleventh lines give 
the values of the averages of X for successive values of y; and of the averages of Y for 
different values of +;, calculated from the equations of regression, Comparison of the 
empirical regression with the regression calculated from the equations of regression is 
shown rapidly in Figure 18, The empirical relationship is indicated by the broken lines 
drawn through the points representing the numbers in the tenth lines (columns and rows), 


Elementary Theory of the Correlation of Two Variables 33] 


y | 
“by 
é/ 2 u 
! / vl 
/ “ 
/ . 
| fv , 
[oe 
i, ; 
7 


/ L 
a 


Fig, 18, The empirical (solid lines) and theoretical (dashed lines) lines of 

regression close to the center of distribution, The center of distribution is 

indicated by the rosette, The numbers denote the corresponding lines of 
regression: 1, y with respect to x and 2, x with respect to Ve 


Calculation of the Parameters of the Distribution 


(a) The average values, Division of the sum of the numbers in column VI by the total 
number of observations gives the average value of /: 


— + 197 


When we take into account the position of the zero point and the size of the step in the 
catalogue units, we obtain 


P= 1.9 + 0.2 x 0,524 = 2,005, 
Division of the sum of the numbers in the sixth row gives the average value of X: 


-- — 168 


m= 55+ 1% (-— 0.447) = 5.05, 


since the step in the table with respect to 7 is equal to unity and the zero point is taken 
at the midpoint of the interval with visible magnitude 5,5, 

(b) The mean square deviations, The sum of the numbers in the seventh column 
divided by the number of observations gives the initial moment of order zero-two, that 
is, the second-order initial moment of the variable }’, When we subtract the square of 
the average value of this variable from this moment (in accordance with the variance 
formula), we obtain the variance of the variable Y; the square root of the variance gives, 
by definition, the mean square deviation: 

505 


oy = 9g — (0.524)? = 1.068. oy = ¥1.068 = 1.033. 


332 Mathematical Analysis of Observations 


Analogously, from the seventh row, 


al 


» 418 
x = 376 


— (0.447)? = 0.912, og = V 0.912 = 0,955. 


6 


(c) The correlation coefficient, Division of the sum of the numbers in column IX (or 
row (9)) by the total number of observations glves the initial moment of order one-one 
of the given two-dimensional distribution: 


Y14 => ae = 0,559. 


To obtain the central moment of the same order, we must subtract from the Initial mo- 
ment the product of the average values of the variables in question: 


U1 = 0,559 — (— 0,447) & (4 0,024) = 0.793. 
To calculate the correlation coefflcient, we form the product of the mean square deviations: 
0484 = 1,033 - 0.955 = 0.987; 
We then obtain 


0,793 


r 


(d) The coefficients of regression and the equations of regression. We calculate these 
in conventional units from the forrnulas containing the correlation coefficlent, As a pre= 
liminary, we find the ratios of the mean square deviations: 


a Gg 
—_” — 1,082, — = 0,924. 
Fy) Sy 


We then find for the coefficients of regresslon the values 
Pyx == 9.803 K 1,082 = 0.869, Pay = 0,803 & 0,924 = U,742. 


The equations of regresslon are as follows: 
The regresslon of y with respect to x: 


Yo — 0.524 = 0,869 X (x + 0,447) 
or 
Va = 0.869 + 0,912 


(The notation y,, can be replaced with just y, but If we do this, we should not forget the 
meaning and origin of the equations of regression, ) 
The regresslon of x with respect to \: 


Xy + 0.447 = 0.742 X (y -- 0,524) 
or 
Xy = 0.742y — 0,836. 


The eleyenth column and row of the matrix were obtained from these equations, 
The values of « and y in the fourth row and column, respectively, of the matrix were 
substituted one after the other Into therightmembers of these equations for that purpose, 

(e) The mean square errors in the parameters. ‘The extra calculations to be described 
now are carrled out in those problems in whlch the staustical material Is a sample 
from an infinite general set that obeys a normal law of distribution, 

In Sectlons 99 and 106, we presented without proof formulas according to which the 
mean square deviations of the basic parameters of one=~ and two-dimenslonal distributions 
can be calculated, 

Some of these formulas are known to the reader from the theory of random errors, 


Elementary Theory of the Correlation of Two Variables 333 


1, The mean square errors of the arithmetic means: 


Go == “a 6 = “tl : 
x Vn ’ y Vn , 
in our case VY n= 19,4, Therefore, 
0.955 . 1.033 | 
° = 19.4 = 0.049; ar = 19.4 = 0,053. 


2, The mean square errors of the sample mean square errors: 


g 
9 (Sx) = 3 (dy) = : ’ 


In our example, ’ zn = 27,4, Therefore, 


o 


0.59 


a7] = 0.038, 


o (a7) = 


These variables characterize the reliability of the parameters in the one~dimensional 
case, The following numbers give an evaluation of the reliability of the parameters 
in a two-dimensional distribution, 

3. The mean square errors in the equations of regression (sy and s, are calculated 
from formulas (19,30) given above), In the present case, r: —. 0,645; } —r?==0,355; 
V1 —r2?= 0.5963 s,, == 1,033- 0.596 = 0,616; 5, ==0,955-0, 596 = 0,569, The numbers Sy 
and Szare the mean square errors that are obtained if we compare the values calculated 
from the equations of regression with the individual (but complete) values obtained from 
observations, 

4, The mean square error of the correlation coefficient, From formula (19,33) we 
obtain for the distribution in question, 


0 355 
cs =—_- —.. — ( 
r liga = 0.018, 


5, The mean square errors in the coefficients of regression are calculated from 
formulas (19,34), The results are 


VIi—r? 0595 
Fa Jog 7 0:0307, 9 (Pyx) = 1.082 - 0.0307 = 0.033, 
3 (xy) = 0.924 - 0.0307 = 0.028. 


SUMMARY OF THE RESULTS 


We write the results of the computation of the parameters of the distribution, taking 
into consideration the mean square deviations, just as we did in the theory of random 
errors, 

A. Summary in conventional units: 


x = — 0.447 + 0.019, y = 0.524 + 0.053 
dp = 0.955 + 0,035, 3 = 1.933 + 0.038 
Pn = 0.869 + 0,033, Pry = (1.742 + 0.028 
Yar = 0.869x + 0.912, Xy = 0.742y — 0.836 


r= + 0.803 + 0.018. 


B, Summary in basic units. In most problems, we must have the parameters not only 
in conventional units, but also in those units in which the investigated quantities are 
usually given, The vaiues are measured from the chosen zero point, 

The method of changing from conventional units and the conventional zero point to 
basic units and a new zero point for a one-dimensional distribution was explained in the 
preceding chapter, Therefore, it will be sufficient to show how the change in the param- 
eters is made in the case of a two-dimensional distribution, Since the coefficients of 


334 Mathematical Analysis of Observations 


regression and the coefficient of correlation depend on the central moments, changing 
the zero point has no effect on them and we need to consider only the size of the steps, 
The correlation coefficient remains completely unchanged, since this is a dimensionless 
quantity, as was shown in the introduction to this section, Therefore, let us consider 
only the problem of the change in the coefficients of regression, It is clear from the 
expressions for these coefficients 


oy o> 
Pyx =F Ts Pry 7 T 
x y 


that in converting to basic units, we must multiply the numerator in the first one by the 
step of the second variable in basic units, and the denominator by the step of the first 
variable, In the second coefficient, the order of the transformations is just the opposite, 
Here, the first variable is assumed to be the one that is denoted by x in conventional 
units, Let us perform the conversion in the example that we have been studying, Since 
the conventional zero of the variable » corresponds to a value of 5,5 of the visible 
magnitude and the step for this variable is equal to unity, the zero point for the variable 
y corresponds to a value of 1,9 of the variable P, and the step of this variable is equal 
to 0.2, Therefore, 


m= 55 —0.447 <~ 1 = 5.053, P=19-+ 0.524 * 0.2 = 2.005 
=. = 0.955 & 1 = 0.955, cy = 1.033 X 0.2 — 0,207 


m 


2017, py = 0.742. X 


0) 


We transform the equations of regression directly by using the coefficient of regression 
and the average values that we have obtained, The equation of regression of / with 
respect to 7” 1s 


Py — 2.005 = 0.174 & (m -- 5.053) or P,,, = 0.174 + 1.126. 
The equation of regression of m with respect to Pis 
Mp~- 5.053 = 3.71 X (P — 2.005) 
or 
mM p = 3.71P — 2.39. 


These calculations make it possible for us to draw the following conclusions about 
the correlation of the variables / and m, It is clear from the values of the mean square 
errors in the parameters that sufficiently accurate values were obtained for all of 
them, especially the correlation coefficient, Therefore, the connection between P and 
m can be considered (on the basis of the investigated material) as being close to linear 
on the average, This is confirmed by a comparison of the numbers in columns X and 
XI and also in rows (10) and (11), However, in the case of the rows, we may find more 
appreciable deviations on the part of the theoretical averages from the empirical ones 
near the left edge of the distribution, The obvious explanation is the small number of 
bright stars, 

It follows from what has been sald that the equations of regression can be used for 
calculating the values of a single variable from the values of the other variable, The 
results of the calculations give on the average a small discrepancy with the empirical 
averages, 

It is recommended that the reader make his own comparison of the empirical re- 
gressions with the theoretical ones in basic units, To do this, he will need to set up the 
tenth column and row in basic units and to check the eleventh row and column from the 
theoretical equations of regression in basic units, [f we find the deviations on the part 
of the numbers in the tenth column from the numbers in the eleventh column and cal- 
culate the weighted mean square deviation, we obtain an overall evaluation of the quality 
of the approximation of the data (on the average) by the equations of regression, 


109. AN EXAMPLE OF INVESTIGATING THE CORRELATION 
FROM A SMALL NUMBER OF OBSERVATIONS 


Analysis of a two-dimensional statistical set when the number of observations is small 
can be considerably simplified, The example that we are about to consider is taken from 
the book by V, !, Romanovskil, 


Elementary Theory of the Correlation of Two Variables 335 


We wish to study the correlation between the amount of precipitation in millimeters 
in January for nine consecutive years in Tashkent and at the agricultural experimental 
station twelve kilometers away from Tashkent, Since the two stations are close to each 
other, we may expect that the relation will be close, but it is necessary to check this 
since local peculiarities distorting the relation are possible, The results of the observa- 
tions in Tashkent are denoted by « and those at the station by \, 

We make a diagram showing the results of observations and certain calculations 
made from them: 


7569’ 87 7482! BO | 7 506 
2009 47-2632] 56 3.136) 
5476' 74 6216 84 7056; 
7396 86 6192) 72 5184 
i444 38 1786! 47 | 2209) 
225 15 255] 17 289) 
1681 41-1763; «43:1: 849) 
64 8 152) 19 36] 
6241 79 6952) 88 7744 


jy - 
Total 32305' 475 [33-430) 512 524 
| 


The diagram is simple and the columnheads are self-explanatory, Note that the totals 
shown at the bottom, which are necessary for what follows, can be obtained on a calculating 
machine by the method of accumulation without writing the individual addends, Here, 
the calculations are made in detail in order to show completely the order in which the 
calculations are made, From the formulas of Sections103 and 104, we make the following 


calculations: 


— 475 — ole ,. 
— 2 = —- ,$ 
x q 52.8, y 9 50,9 
32500 35204 
Yon = — 5 * = 3589.44, yy = 5 = 9913.78, 


nog == 358).44 — 52.8? = 801.60, poy e= 3913.78 -- 55,9? = 676.17, 


sy = V 801.60 = 28.3 o, = ¥' 676.17 = 26.0, 
33430 


M1 = yp = 3710, x y = 52.8- 56.9 = 3000, 


wt, = 3710 — 3000 = 710, gay = 28.3 X 26.0 = 736, 


10 = 0.960, r? = 931, 1) — r? = 0,009, 


5, = JOY 9.023, r= 0.965 + 0.023. 
V9 


The correlation coefficient is close to unity and its mean square error is very small 
despite the very small number of observations, The calculations were first made in a 
formally exact manner, that is, as if the values of « and y were exact, Beginning with the 
mean square deviations, the calculations are taken to three digits, 

The result of the determination of the correlation coefficient shows that the equations 
of regression can be used to calculate the values of one variable from the values of the 
other with a small mean square error, 

It is suggested that the reader derive for himself the equations of regression and 
compare the results of calculation from the equations of regression with observations, 


Answer (incomplete): Py. = 9,887, Pay = 1,050, 


BIBLIOGRAPHY 


Part I 


Bezikovich, Ya. S., Priblizhennye vychisleniya (Approximate Cal- 
culations), 6th ed., Gostekhizdat, 1949. 

Krylov, A. N., Lektsi o priblizhennykh vychislentyakh (Lectures 
on Approximate Calculations), 6th ed., Gostekhizdat, 1954; 
Sobranie trudov (Collected Works), vol. II, part 1, Izdatel’stvo 
Akad. Nauk, USSR, 1949. 

Yakovlev, K. P., Matematicheskaya obrabotka rezul’tatov izmerenii 
(Mathematical Processing of the Results of Measurements), 
2nd ed., Gostekhizdat, 1953. 


Part II 


Blazhko, S. N., Kurs sfericheskoi astronomii (Course in Spherical 
Astronomy), 2nd ed., Gostekhizdat, 1954, ch. II. 

Goncharov, V. L. | Teoriya interpolirovaniya 1 priblizheniya funktsii 
(Theory of Interpolation and Approximation of Functions), 2nd 
ed., Gostekhizdat, 1954. 

Kazakov, S. A., Kurs sfericheskoi astronomii (Course in Spherical 
Astronomy), with supplement on interpolation, 2nd ed., Gos- 
tekhizdat, 1940. 

(See also the bibliography for Part I.) 


Part III 


Bernstein, S. N., Teoriya veroyatnostei (Probability Theory) 4th 
ed., Gostekhizdat, 1946, 

Gnedenko, B. V., Kurs teoriz veroyatnostei (Course in Probability 
Theory), ond ed., Gostekhizdat, 1954. 

Gnedenko, B. V., and Khinchin, A. Ya., Elementarnoe vvedenie v 
Teoriyu veroyatnostei (Elementary Introduction to Probability 
Theory), 3rd ed., Gostekhizdat, 1952. 

Goncharov, V. L., Teoriya veroyatnostet (Probability Theory), 
Oborongiz, 1939. ’ 

Kolomogorov, A. N., Osnovnye ponyatiya teorii veroyatnoste1 
(Basic Concepts of Probability Theory) ONTI, 1936. 


336 


Bibliography 337 


Romanovskii, V. I., Elementarnyi kurs matematicheskoi statistiki 
(Elementary Course in Mathematical Statistics), Gosplanizdat, 
1939. 


Part IV 


Idel’son, N.1., Sposob naimen’shikh kvadratov i teorya matemati- 
cheskoi nablyudenii (The Method of Least Squares and the 
jenn of Mathematical Analysis of Observations) Geodezizdat, - 
1947, 

Kolmogorov, A. N., ‘SK obosnovaniyu metoda naimen’shikh kvadra- 
tov’? (The Basis of the Method of Least Squares), Uspekhi 
matematicheskikh nauk, 1, No. 1, 1946. 

Lakhtin, L. K., Kurs teorii veroyatnostei (Course in Probability 
Theory) Gosudarstvennoe izdatel’stvo, 1924. 

Makover, S. G., ‘‘Reshenie sistemy normal’nykh uravnenii pri 
pomoshchi matrits”? (Solution of Systems of Normal Equations 
by Means of Matrices), Astronomicheskit Zhurnal, 33, No. 3, 
1956. 

Reznikovskii, P. T., ‘‘Ob odnom variante resheniya sistemy 
normal’nykh uravnenii v metode naimen’shikh kvadratov (A 
Variation in the Solution of Systems of Normal Equations by 
the Method of Least Squares), Soobshch, Gos, astron, instituta 
im, Shternberga, No. 54, 1950. 

Romanovskii, V. I., Osnounye zadachi teorti oshibok (Basic Prob- 
lems in the Theory of Errors), Gostekhizdat, 1947. 

Semendyev, K. A., Empiricheskie formuly (Empirical Formulas), 
ONTI, 1933. 

Whittaker, E. and Robinson, G., Mathematical analysis of the re- 
sults of measurements, ae 

Faddeeva, V. N., Vychislitel’nye metody lineinoi algebry (Computa- 
tional Methods of Linear Algebra), Gostekhizdat, 1950. 

(See also the bibliography to Part I and the books by Gnedenko and 
Khinchin in Part III.) 


Part V 


Cramer, H., Mathematical Methods of Statistics, Princeton Univer- 
sity Press, Princeton, 1946. 

Romanovskii, V. 1., Matematicheskaya_ statistika (Mathematical 
Statistics), ONTI, 1938. 

Slutskii, E. E., Teoriya korrelyatsii i elementy ucheniya o krivykh 
vaspredeleniya (The Theory of Correlation and Elements inthe 
Study of Distribution curves), Kiev, 1912. 

Smirnov, N. Y.,‘‘Priblizhenie zakonov raspredeleniya sluchainykh 
velichin po empiricheskim dannym’’ (Approximation of the 
Laws of Distribution of Random Variables from Empirical Data), 
Uspekhi matematicheskikh nauk, No. 10, 1944. 

(See also the books by Gnedenko and Romanovskii in Part III). 


Z 


APPENDIX 


Table 1 gives the values of the Laplace-Gauss function, which is 
used as an approximate expression for the probability of the number 
of occurrences in the problem of repeated trials. This function is 
the probability density of a normal distribution with center equal to 
zero and mean square deviation equal to unity. Suppose that n is the 
number of trials, p is the probability of an occurrence in a single 
trial, and & is the given number of occurrences whose probability 
we wish to compute. The calculation is made on the basis of the 
formula 


l k—np 
P. me 0’ ( ) g=1l—p, 
wn Vag \ V npg 
23 
1 oF 
O' (z)= é . 
= Te 


Table Il gives the approximate values of the probability in the 
case of the problem of repeated trials when the probability of an 
event is small.* ln this table, & is the number of occurrences, 
a=np,n is the number of trials, and p is the probability of an oc- 
currence in a single trial. ln this case, 


Pin = Py (@) = - 


Table Ill gives the values of the probability integral that are 
used for calculating the probability that a random variable will 
assume a value inagiveninterval ifit obeys a normal law of distri- 
bution, If «isthe average value ofthe random variable X (the center 
of distribution) and « is the mean square deviation, the probability 
of obtaining a value between a and 3 is calculated from the formula 


pa<xX<py=o(P=*)_o(2=-), 


° 
e {2 


l —_—— 
®&(z) = —— 4 dt. 
(z) Viz fe at 
0 
ln using this table, we should bear in mind that ®(—z)=— ®(z). 


*The table in B, V. Gnedenko’s book Kurs teorii veroyatnostei (Course in Probability 
Theory) was used, 


338 


Appendix 339 


If a number in Table I or II ends with the digit 5 and a plus sign 
appears after it, the number is an approximation that is less than 
the actual value. If the digit 5 is to be discarded, the preceding 
digit should be increased by one unit. Ifa minus sign appears after 
the 5, the value shown is in excess of the actual value. If the 5 is 
to be discarded, the preceding digit should be left unchanged. 


340 Appendix 
Table I 


S222 2 
aan wm 


© 


0 
0, 
0 


Oooo 
eo ee 
Whe © Oon avn nA 


Ooo 
es 


Pe ees 2 
~] DNA A DN LADHD 


PEELS LS 
WNNNND ONY 
OBDIAR BWNEKO CwmoUNAMH 


£ 


229299 


PPS 
lo oie oe ee ole =) 
Mm OD Dd pee © 


OOOO 
e e 


POs 
0 00 00 G0 Go 
OoOUAH 


ae) 

e eo . ) eo 

OOowwmow 
ANG 


CW BO im © 


S99 S29 
000 0 0 
Oars anAwN 


299 


0,135+ | 0,050 


0,271 
0,271 
0, 180 
0,090 
0,036 
0,0 12 
0.003 
0,001 
0.000 


0,149 
0,224 
0,224 
0,168 
0,101 
0,050 
0.022 
0,008 
0.003 
0,001 
0.000 


Ap pendix 


Table I (continued) 


0.1754 
0.1754 
0, 146 
0, 104 
0.0654 
0,036 
0.018 
0,008 
0,003 
0,001 
0,000 


34] 


342 Appendix 


Table III 


49903 49931 49952 
49984 49989 49993 


499968 
499997 
49999997 


INDEX 


A posteriori probabilities, defini- 
tion of 109 


A priori probabilities, definition 
of 109 
Absolute error 4 
see also Limiting absolute error 
Addition 


error in 14-18 
limiting absolute error for 15 
of expectations of discrete random 
variables 133 
of probabilities 102, 114 
statistical estimate of error of 
16 
Algebraic polynomials 48 
Analytic expression 45 
Angle 
determination 
from logarithm of 
function 33 
from trigonometric function 29 
limiting error 32 
Approximate calculations 
fundamental problems in theory of 
3 
inverse problem of 39-41 
Approximate numbers 
estimation of errors of 3-13 
exact error of 4 
operations with l 
Approximate values 
limiting absolute errors of 41 
limiting relative errors of 41 
measured quantity 219 


trigonometric 


Approximation of tabulated func— 
tions 45-49 
Arithmetic mean, mean square error 


of 215 
Asymmetry 181 
Average values, calculation of 331 


Bayes' formula 111 
corollary to 214, 218, 227, 
240, 242 


231, 


343 


Bayes' theorem 110 
Bernoulli's theorem 144-46, 156-57 
examples of application of 146-48 
Bernstein, S.N. 336 
Bessel's formula 88-93 
first variant 88 
second variant 91 
Bezikovich, Ya. S. 21, 336 
Bézout's theorem 65 
Binomial expansion, Newton's 115 
Binomial probability distribution 
see Probability distribution 
Blazhko, S.N. 36, 336 


Calculating machines, 21, 23, 68 
Center of distribution 164, 167-68, 
189, 313 
Center of normal distribution 197 
Central moment of order of random 
variables 165 
Charlier curve 295 
type-A 181 
Charlier distribution 304 
Charlier function 295 
Charlier's law, probability density 
of 182 
Chebyshev-Markov lemma 142,144, 152 
Chebyshev's inequalities 17,268,291 
first 153 
second 153 
Chebyshev's theorem 157 
corollary to 155, 157, 219 
inequalities and 152 
proof of 153 
Coefficient of regression 195, 200, 
320, 321 
average error of sample 326 
calculation of 332 
mean square error in 333 
Components of composite event, 
finition of terms 104 
Composite event, definition of term 
104 
Conditional equations 237-39 


de- 


344 INDEX 


Conditional equations contimed 
definition of 237 
linear 247-50 

example of procedure 
269 
reduction of nonlinear to 
243-47 
reduction of unequally precise to 
equally precise 241-43 

Conditional expectation 192 

Conditional probability 
definition of term 104 
of one event given another 

106 

Conditional probability density 189- 
92, 195 
definition of 190 

Confidence limits 
calculation of 303 
definition of 303 

Confidence probability 
calculation of 303 
definition of 303 

Continuous distribution, 

305 
Continuous empirical distributions 
292-98 

Continuous random variables 159-86 
calculation of probability 175 
correlated 187, 190 
definition of term 130 
distribution function of 159-61 
expectation of 163 
independent 187 
joint probability density 

187 

joint probability distribution of 

two 187-201 

moments of 164-66 

normal distribution of two 192-95 
variance of 165 
Coordinates of sun relative to cen- 

ter of earth 47 

Correlation 
field of 313 
from small number of 

334 
in case of large momber of 
vations 327 

Correlation coefficient 320-24 
average error of sample 325 
bounds for values of 324 
calculation of 329, 332, 335 
definition of 200 
linear, definition of 322 
mean square error of 333, 335 


for solving 


linear 


event 


graph of 


of two 


observations 


obser— 


Correlation theory, of two vari- 
ables 311-35 
probabilistic theory of 327 
Correlational dependence 313, 322, 


323 


Correlational tables 313, 314 
Cos.x 28 

Cotx 28 

Covariance, definition of term 199 
Cramer, H. 337 


Defined quantities, definition of 
term 129 


Dependent events, definition of 
term 106 

Difference interpolation, formula 
for 64 


Difference quotients 
arbitrary order 61 
construction of table of 62 
with variable step 64-65 
first-order 60, 62, 64, 66, 67 
nth degree polynomial 65 
nth-order 62, 64, 65 
second-order 61, 62, 64, 65,66, 67 
table of 65 
tabulated functions 60-62, 64 
third-order 61, 66 
zeroth-order 66, 67 
Differences 
arbitrary-order 73 
central, of tabulated function with 
constant step 69, 71 
first-order 70, 72 
in inversion of table 74 
inversion of table 74 
mimus-first-order 72-73 
mimus—second-order 73 
negative-order, of tabulated func-— 
tion with constant step 72 
nth order 70 
of tabulated function 80 
ordinary, basic properties of 73 
polynomial 76 
representation of various orders 
in terms of tabulated values of 
function 73 
second-order 70, 73 
in inversion of table 74 
sum of single-order 75 
table of, check for 76 
effect of error on 76-78 
location of error in 77 
zeroth-order 72 
Differential distribution function 
162, 188 
Differential equation 45 
Differential formula for error es-~ 
timation 35 
Discrete distribution, graph of 304 
Discrete empirical distribution 287 
Discrete random variables 129-41 
addition of expectations of 1933 
definition of 129, 130 
expectation of 131-33 


INDEX 


Distribution 
continuous 292-98, 305 
Maxwellian 184 
of absolute value of velocities 181 
of several random variables 325 
of stellar parallaxes 180 
other than normal 180-86 
Student's 185 
Distribution curves 
Charlier type-A 18] 
Pearson's 182 
Distribution function 166, 185 
definition of 159-60 
for normal law, 175 
graph of 304 


of continuous random variables 
159-61 
theoretical 298 
Distribution parameters, calcula= 


tion of 331 
Division, error in 23-25 
limiting relative error for 24-25 


Eccentricity of protractor 205 
Edgeworth 238 
Ellipses of equal probabilities 200 
Empirical distribution 
comparison with normal 302 
comparison with theoretical 
306 
continuous 292-98 
of two continuous variables, graph- 
ical representation 313 
of two random variables 311 
table 288 
Empirical formulas 273-81 
calculation of parameters in 
277-79 
checking of 279 
choice of type of 274-77 
definition of 274 
evaluation of suitability of 280 
illustration of derivation of 280 
linear, for two variables 315-18 
Empirical function, definition of 
274 
Empirical sets, graphical represen— 
tation of 304 
Empirical surface of 
313 
Enke's method 266 
Equations 
conditional 
see Conditional equations 
normal 
see Normal equations 
unknowns in, determination by 
method of least squares 236-72 
Equations of regression 192 
average errors of 324 


298, 


distribution 


345 


Equations of regression contimed 
calculation of 332 
derivation of linear 318-20 
mean square error in 333 
method of setting up 329 
Errors 
estimation of, differential formu- 
lae for 35 
from mumber of known digits 10 
in point interpolation 54-59 
exact, of approximate numbers 4 
in basic elementary functions 
27-34 
in differences in table 76 
in functions of several variables 
34-39 
of two variables 34 
with approximate arguments, esti 
mate of 26-41 
in fundamental arithmetic 
tions 14-25 
limiting absolute 
see Limiting absolute error 
limiting relative 
see Limiting relative error 
of approximate numbers, estimation 
of 3-13 
of measarement 3 
Euler's integral 184 
Exact error of approximate 
4 
Excess 181 
Expectations 
addition of, of discrete 
variables 133 
conditional 192 
multiplication of, of 
random variables 135 
of contimmous random variables 163 
of discrete random variable 131-33 
of mumber of occurrences 139 
Exponentials 31 
Extrapolation 47, 55, 274 


opera— 


numbers 


random 


inde pendent 


Faddeeva, V.N. 260, 337 
Field of correlation 313 
Fischer 327 
Functions 
basic elementary, errors in 27-34 


of several variables, error in 
34-39 
of single independent variable, 


limiting errors of 26-41 
of two variables, error in 34 
with approximate arguments, 
mate of error in 26-41 
see also Tabulated functions 


esti- 


Gauss' curve 174-75, 200, 307 


346 INDEX 


Gauss' law 192 

Gauss' method 255, 260 
Gauss' notation 248 

Gauss' theorem 264-65, 270 
Gauss~Doolittle method 260 
Gnedenko, B.V. 336, 338 
Goncharov, V.L. 209, 336 


Histogram 313 
construction of 306 
Hypotheses 
a posteriori probability of 108-12 
a priori probability of 108-12 
definition of 107 
use of term 98 


Idel'son, N.I. 237, 266, 337 

Indefinite integral 45 

Independent events, definition of 

_ term 106 

Independent random variables,multi- 
plication of expectations of 135 


Inequalities and Chebyshev's theo~ 
rem 152 

Infinite series 45 

Initial moment of order of random 


variables 164 
Instruments, errors due to 205 
Integral distribution function 162 
Integral power 22 
Interpolation 47 
backward, Newton's formula for 
81, 93 
difference, formula for 64 
forward, Newton's formula for 80, 
93 
from table with constant 
69-94 
with variable step 60-68 
of function of single variable 49 
on average 90 
problem of 274, 277 
see also Point interpolation 
Interpolational difference formulas, 
application of 93-94 
Interpolational error 50 
estimation of 68, 94 
Interpolational formulas 62 
for tables with constant step 78 
Lagrange's, operations involved in 
62-64 
Newton's 65-68 
Interpolational polynomial 54 
choice of degree of 64 
determination of 51 
determination of coefficients of 
80, 81, 85 
fifth-degree 90 
for periodic function 50 


step 


Interpolational polynomial 
tinued 
for table with variable step 66 
formation of coefficients 90 
Lagrange's 51-54 
nth degree 64 
theorem on existence of 49-51 
third-degree 88 
Inverse problem of approximate cal-— 
culations 39-41 


con- 


Joint probability density 188, 189 
of two random variables 191 
of two variables 187 

Joint tables 313 


Kazakov, S.A. 336 

Kepler 273 

Kepler's equation 39 

Khinchin, A. Ya. 336 
Kolmogorov, A.N. 336, 337 
Kolmogorov's criterion 299, 302 
Krylov, A.N. 210, 336 


Lagrange's interpolational formula, 
operations involved in 62-64 
Lagrange's interpolational 
nomial 51-54 
examples of use of 53-54, 57, 58 
Lakhtin, L.K. 183, 337 
Laplace-Gauss function 125 
table of 338 
Laplace's approximation 
123-26 
for repeated trials 169 
Laplace's limit theorem 148-52, 157 
Large numbers 
law of 142-58, 326 
comments on 156 
Law of large numbers 142-58, 326 
comments on 156 
Law of rare events 127 
Least squares, method of 212-15,255 
determination of several unknowns 
in equations by 236-72 
Legendre's principles 236, 248,249, 
250 
determination of parameters in 
empirical formulas 277-79 
generalization of,statement of 242 
generalization to unequally  pre- 
cise conditional equations 24]1—43 
probabilistic meaning of 239 
statement of 238 
Lemma, Chebyshev—Markov 142, 144, 152 
Limiting absolute error 5-8 
for addition 15 
for multiplication 20 


poly- 


formula 


INDEX 347 


Limiting absolute error contimed 
general rule for checking 27 
of approximate values 41 
of function of single argument 26 
of two variables 35 
of logarithm 32 
Limiting errors 
of function of single independent 
variable 26-27 
of point interpolation, devermina- 
tion of 55 
Limiting relative error 8-10 
for division 24-25 
for multiplication 21 
of approximate values 41 
of function of single argument 27 
of logarithm 31 
Signific-nt figures relation with 
10-13 
Linear correlation 200 
Lines of regression 192, 195,197—98, 
199 
Logarithm 31] 
angle determination from, of tri- 
gonometric function 33 
of trigonometric function 32 
lyapunov's theorem 180, 209 
formation of 168 


Maclaurin series 150 
Makover, S.G. 260, 266, 337 
Markov, A.A. 219 
Markov's criterion 219-20 
Maxwellian distribution 184 
Mean deviation 177 
Mean error 177 
Mean probability density 188 
Mean square deviation 177, 185,194, 
199, 200, 279 
calculation of 331 
of statistical sets 290 
Mean square error 208, 224-25 
calculation of 221-23, 332-33 
evaluation of suitability of empi- 


Measurement continued 


without rounding off 6 
see also Measurements 


Measurement errors 3, 205 


general remarks on 205-11 

gross 207 

instrumental 205 

personal 207 

random 206 

basic hypothesis in theory of 208 
methods of evaluating 208 
systematic 205 

types of 205-08 

see also Mean square error 


Measurements 


analysis of nnequally precise 
224-35 
example of 233-35 
concept of unequally precise 
22426 
equal precision of, concept of 212 
example of unequally precise 224 
of fixed quantity 
analysis 212 
analysis of equally precise 
212-23 
of unit weight, mean square error 
of 229 
weighted 224-26 
weighted mean of unequally precise 
228 
see also Measurement 


Median 


determination by ogive 305 

in uniform probability distribu- 
tion 167 

of random variable 161 


Method of determinants 255 


calculation of weights by 263 


Method of equal influences 40 
Method of least squares 212-125,255 


determination of several unknowns 
in equations by 236-72 


Method of successive elimination of 


unknowns 255 


ral formula by 279 

in coefficient of regression 333 
in equation of regression 333 

of arithmetic mean 215 

of correlation coefficient 333,335 
of individual measurement 216-20 
of measurements of unit weight 229 
of parameters of sample set 308-09 
of unknowns 255, 266 

of weighted mean 228, 233 

per unit weight 232, 266 

Measured quantity 

approximate value of 219 

most probable value of 212, 226 
Measurement 

with rounding off 6 


Modulus 
Moments 

tical 
Moments 
Moments 


of precision 209 

of distribution for statis—- 
sets 291 

of normal distribution 178 
of random variables 164-66 


Monde, van der, determinant 50 
Multiplication 
error in 20-23 
limiting absolute error for 20 
limiting relative error in 21] 
of approximate numbers 21] 
of expectations of inde pendent 
random variables 135 
of probabilities 104, 114 


Newton 273 


348 INDEX 


Newton's binomial expansion 115 
Newton's formula 65-68 
examples of use of & 
for interpolating backward 81, 93 
for interpolating forward 80, 93 
Normal distribution 
center of 197 
conditions of applicability of 169 
definition of 169 
function of 175 
moments of 178 
of two continuous random variables 
192-95 
probability density of 172, 
201, 338 
Normal equations 
calculation by determinants 263 
check on setting up 251 
definition of 239 
Gauss' method for solving 264 
linear 247-50 
calculation of weights of 
knowns 260-66 
solution of system of 254-60 
Normal law 309 
approximate derivation of 170 
distribution function for 175 
parameters of 172 


195— 


un 


Occurrences 
determination of number of 122 
expectation and variance of number 
of 139 
in repeated trials 114-22 
Ogive 305 


Outcome, definition of term 98 


Partial sum of power series 50 
Pearson 170, 299, 326 
Pearson distribution 304 
Pearson distribution curves 182 
Perturbational polynomial 181 
Point interpolation 45—49 
application to arbitrary function 
50 
concept of 45-49 
definition of 49 
error estimation in 54-59 
limiting error of, determination 


of 55 
requirement of 49 
see also Interpolation 


Poisson distribution 127 
Poisson distribution function, table 
of 128 
Polygon, construction of 306 
Polynomial 
algebraic 48 
trigonometric 48 


Polynomial continued 
see also Interpolational polynomial 
Polynomial differences 76 
Power series, partial sum of 50 
Powers 30 
Precision coefficient 300, 302 
Probabilistic significance of ele- 
mentary theory of correlation 327 
Probabilities 
a posteriori, definition of 109 
of hypotheses 108-12 
a priori, definition of 109 
of hypotheses 108-12 
addition of 102, 114 
calculation from known probabili- 
ties of other events 102 
continuous random variable 175 
computation of 185 
maltiplication of 104, 114 
of mutually exclusive events 102-04 
of possible number of times of 
occurrence of an event 123-26 
statistical 156 
Probability 
calculation of 100 
examples of 100-01 
classical definition of 99 
concept of 97 
conditional, definition of term 104 
of one event given another’ event 
106 
confidence 303 
consequences of definition of 100 


determination in statistical sets 
288 
in case of repeated trials, table 


of 338 
of random event 100 
of random variable,table of proba- 
bility integral for 338 
theory 97 
total, calculation of 107 
definition of 107 
Probability density 161-63,169,181, 
185 
condition of normalization of 189 
conditional 189-92, 195 
definition of 190 
constant 166 
graph 164, 166 
graph of empirical 305 
joint 188, 189 
of two random variables 191 
of two variables 187 
mean 188 
of Charlier's law 182 
of normal distribution 
201, 338 
of standard normal law 174 
of two variables 189 
two-dimensional 189 


172, 195- 


1NDEX 


Probability distribution 
approximating curve for 126 
binomial, general properties of 
117-22 

definition of 115 

examples of 117 

for mumber of times that event may 
occur 115-22 

graphical representation 115 

in repeated trials 115 
forms of 118-21 

normal 168 

of two continuous random variables 
187-201 

uniform 166-68 

Probability integral 176 
table of 338 

Probability theory 97 

Protractor, eccentricity of 205 


Random errors in measurements 129 
Random events 113 
Random experiments 97-99, 113 
Random measurement errors 
see Measurement errors 
Random quantities, definition of 
term 129 
Random variables 
central moment of order of 165 
continuous 
see Continuous random variables 
deviations from expectation 136 
discrete 
see Discrete random variables 
empirical distribution of two 31] 
independent, multiplication of ex- 
pectations of 135 
initial moment of order of 164 
median of 161 
moments of 164-66 
mutually independent 131 
probability of, table of 
bility integral for 338 
types of 129 
variance of, definition and prop- 
erties of 136 
Rectangle of possible 
277 
Regression 
coefficient of 
see Coefficient of regression 
equations of 192 
lines of 192, 195, 197-98, 199 
Repeated trials 113-28 
definition of 113 
formulation of problem 114 
Laplace's approximate formula 169 
occurrences in 114-22 
probability distribution in 116 
forms of 118-21 


proba— 


deviations 


349 


Reznikovskii, P.T. 260, 264, 337 
Right-triangle rule 257, 258 
Robinson, G. 337 
Rolle's theorem 55, 56 
Romanovskii, V.I. 185, 260,269,285, 
327, 334, 337 
Roots 30 
Rounding off 
error resulting from 8 
measurement with 6 
measurement without 6 
rule for 8 


Semedyev, K.A. 337 
Sigma rule 177 
Significant figures relation 
limiting relative error 10-13 
Sinx 28 
Slutskii, E.E. 337 
Smirnov, N.Y. 337 
Standard deviation 177 
Standard normal law of distribution 
174 
Statistical material,analysis of 285 
Statistical probabilities 156 
Statistical sets 
analysis of one-dimensional 285-310 
average errors of parameters of 
sample set 308 
average value of 289 
continuous empirical distributions 
292-98 
dimensionless numerical character-— 
istics 290 
general, definition of 286 
graphical representation of 304 
mean square deviation of 290 
median of 289 
moments of distribution for 291] 
nature of 285 
overall picture of 285 
sample of existent general set 286 
summary of procedure for analysis 
of one-dimensional 309-10 
theory of correlation of two vari~ 
ables 311-35 
Stirling's formula 85-87, 123, 124 
example of use of 87 
Student's distribution 185 


with 


Subtraction of close numbers, error 
of 18-19 
Sun's coordinates relative to cen- 


ter of earth 47 


Tabulated functions 
approximation of 45-49 
constant step 69, 78 
central differences 71 
differences of negative orders 72 


350 INDEX 


Tabulated functions: constant step 
continued 
ordinary differences 69 
difference quotients of 60-62, 64 
differences of 80 
inversion of table 74 
periodic function 50 
sum of differences of single order 
75 
variable step 60-68 
Tanx 28 
Tashkent observatory, 
latitude of 221 
Taylor's formula 27 
3~sigma rule 177, 233 
Threshold of sensitivity 3 
Total probability 
calculation of 107 
definition of 107 
Trial, repeated 
see Repeated trials 
Trigonometric function 
angle determination from 29 
logarithm of 33 
logarithm of 32 
Trigonometric polynomials 48 


geographic 


Uniform probability distribution 166 


Unknowns 
calculation of weights of 260 
in equations, determination by 
method of least squares 236-72 
mean square error of 266 


method of successive elimination 
of 255 
Variance 
of continuous random variables 
165 


of mumber of occurrences 139 
of random variables,definition and 
properties of 136 


Weierstrass' theorem 50 
Weight calculations 260-66 
Weight equations 317 
Weighted mean 
mean square error of 228, 233 
of unequally precise measurements 
228 
Weighted measurements 224-26 
Whittaker, E. 337 


Yakovlev, K.P. 38, 336 


